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ABSTRACT 


We  develop  statistical  models  to  identify  the  most  influential  entry-level  attributes  of  a 
Marine  recruit  to  predict  two  performance  measures:  the  Computed  Tier  Score  and  the 
time  to  achieve  the  rank  of  Corporal  (E-4)  in  the  0621  Field  Radio  Operator  Military 
Occupational  Specialty  (MOS).  We  use  data  collected  from  2007  through  2014,  on  more 
than  1,100  Marines  in  the  0621  MOS  to  construct  multivariate  linear  regression  models  to 
estimate  Marines’  Computed  Tier  Score  and  time  to  achieve  E-4  based  on  their  individual 
personal  and  professional  attributes. 

We  find  statistically  significant  relationships  to  exist  between  the  entry-level 
attributes  of  a  Marine  recruit  and  the  performance  measures.  The  most  influential 
predictor  variables  include  the  run  time  on  the  USMC  Initial  Skills  Test  (1ST),  number  of 
crunches  on  the  1ST,  rifle  score,  the  Armed  Services  Vocational  Aptitude  Battery 
(ASVAB)  General  Technical  (GT)  score,  ASVAB  Clerical  (CE)  score,  ASVAB  General 
Science  (GS)  score,  ASVAB  Mathematics  Knowledge  (MK)  score,  ASVAB  Paragraph 
Comprehension  (PC)  score,  weight,  and  whether  a  Marine  receives  a  weight  waiver  upon 
entrance  into  service.  We  recommend  that  new  job  performance  measures  be  created  for 
each  high-density  MOS  in  order  to  conduct  further  testing  for  MOS  suitability. 
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EXECUTIVE  SUMMARY 


Each  year,  the  United  States  Marine  Corps  (USMC)  aeeesses  thousands  of  new  recruits 
into  a  variety  of  eareer  fields,  an  assignment  proeess  that  has  significant  implications  for 
the  USMC  and  the  individual  Marine’s  future  eareer  path.  The  USMC  expends 
eonsiderable  manpower  and  time  ensuring  that  annual  Military  Oceupational  Specialty 
(MOS)  recruiting  targets  are  met  while  trying  to  best  mateh  eaeh  reeruit  to  those 
requirements.  This  researeh  aims  to  provide  the  Marine  Corps  with  an  understanding  of 
relationships  between  entry-level  attributes  of  Marine  reeruits  and  two  performanee 
measures  in  order  to  better  seleet  the  right  reeruits  for  eaeh  MOS.  We  develop  statistieal 
models  to  identify  the  most  influential  entry-level  attributes  of  a  Marine  reeruit  in 
predieting  two  performanee  measures:  the  Computed  Tier  Seore  eaptured  at  the  time  of 
re-enlistment  eligibility,  and  the  time  to  aehieve  the  rank  of  Corporal  (E-4)  in  the  0621 
Field  Radio  Operator  MOS  in  the  USMC. 

Using  data  eolleeted  from  2007  through  2014  on  more  than  1,100  Marines  in  the 
0621  MOS,  multivariate  linear  regression  models  are  developed  to  predict  a  Marine’s 
Computed  Tier  Seore  and  time  to  aehieve  E-4  based  on  their  individual  personal  and 
professional  entry-level  attributes.  These  attributes,  whieh  inelude  physieal 
eharacteristies,  test  seores,  physieal  fitness  measures,  edueation,  and  waiver  information, 
eomprise  the  independent  variables  in  the  study.  This  study  answers  the  following 
questions: 

1.  Do  significant  relationships  exist  between  entry-level  attributes  of  a 
USMC  reeruit  and  the  USMC  Computed  Tier  Score  or  the  time  for  a 
Marine  to  aehieve  the  pay  grade  of  E-4? 

2.  What  are  the  most  influential  independent  variables  that  predict  the 
Computed  Tier  Seore  and  the  time  to  promotion  to  E-4  in  a  partieular 
MOS  field? 

3.  What  insight  does  this  analysis  provide  in  terms  of  reeommending  ehanges 
to  the  eurrent  entranee  eriteria  for  the  0621  Field  Radio  Operator  MOS? 

4.  What  direetion  should  a  future  study  take  to  examine  ways  in  whieh  the 
matehing  of  USMC  reeruits  to  MOS  fields  ean  be  improved? 
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We  find  that  statistically  significant  relationships  do  exist  between  the  entry-level 
attributes  of  a  Marine  recruit  and  the  USMC  Computed  Tier  Score,  as  well  as  the  time  to 
achieve  the  pay  grade  of  E-4  within  the  0621  MOS  in  the  USMC.  Entry-level  attributes 
of  Marine  recruits  can  be  utilized  to  predict  these  dependent  variables.  Additionally,  we 
recommend  that  this  analysis  be  conducted  on  an  annual  basis,  and  not  pooled  into  a 
multi-year  study,  at  least  into  the  near  future. 

The  most  influential  predictor  variables  that  allow  prediction  of  the  Computed 
Tier  Score  are  found  to  be  the  Initial  Skills  Test  (1ST)  run  time,  1ST  crunches,  rifle  score, 
the  Armed  Services  Vocational  Aptitude  Battery  (ASVAB)  General  Technical  (GT) 
composite  score,  weight  of  a  Marine,  and  whether  a  Marine  received  a  weight  waiver 
upon  entrance  into  service.  We  find  the  most  influential  predictor  variables  for  predicting 
the  time  to  achieve  the  pay  grade  of  E-4  to  be  1ST  crunches,  1ST  run  time,  rifle  score,  the 
ASVAB  General  Science  (GS)  subscore,  ASVAB  Mathematics  Knowledge  (MK) 
subscore,  ASVAB  Paragraph  Comprehension  (PC)  subscore,  ASVAB 
Clerical/ Administrative  (CE)  composite  score,  and  whether  a  Marine  receives  a  weight 
waiver  upon  service  entrance.  While  1ST  crunches,  1ST  run  time,  rifle  score,  and  weight 
provide  insight  into  the  predicted  time  to  achieve  the  pay  grade  of  E-4,  the  variables  GS, 
MK,  PC,  and  CE  SCORE  offer  intriguing  evidence  that  the  USMC  should  further 
explore  these  variables  for  inclusion  in  the  entrance  criteria  of  a  Eield  Radio  Operator. 

In  order  to  explore  other  suitability  to  MOS  measures  that  could  lend  to  predicting 
a  successful  match,  we  have  determined  that  there  is  a  need  for  the  development  of  new 
suitability  measures.  It  is  the  recommendation  of  this  study  that  new  job  performance 
measures  be  created  for  each  high-density  MOS  in  order  to  conduct  further  testing  for 
MOS  suitability.  With  the  development  of  new  success  or  job  performance  measures,  this 
study  can  be  replicated  using  the  new  job  performance  measures  as  the  dependent 
variable  for  analysis. 
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I.  INTRODUCTION 


A,  MOTIVATION  AND  OBJECTIVES 

Each  year,  the  United  States  Marine  Corps  (USMC)  aecesses  thousands  of  new 
reeruits  into  a  variety  of  career  fields;  an  assignment  process  that  has  signifieant 
implications  for  the  USMC  and  the  individual  Marine’s  future  career  path.  While  the 
process  takes  into  account  both  the  reeruit’s  preferences  as  well  as  the  needs  of  the 
Marine  Corps,  it  is  clear  that  there  is  scope  for  making  the  assignment  proeess  more 
efficient.  More  specifically,  there  is  continued  desire  to  ensure  that  reeruits  are  best 
matched  to  the  right  Military  Occupational  Specialty  (MOS).  Matching  recruits  to  the 
MOS  that  they  will  most  likely  suceeed  and  have  a  high  level  of  performance  improves 
not  only  the  quality  of  each  MOS  as  a  community,  but  the  USMC  as  a  whole. 

The  USMC  spends  considerable  manpower  and  time  ensuring  that  annual  MOS 
reeruiting  targets  are  met  while  trying  to  best  match  each  recruit  to  those  requirements. 
Currently,  the  USMC  utilizes  various  entrance  criteria  to  ensure  that  Marines  are 
qualified  to  enter  a  speeific  MOS  field.  Headquarters  Marine  Corps  (HQMC),  D.C. 
Manpower  and  Reserve  Affairs  (M&RA)  is  investigating  ways  to  improve  the  eareer 
field  assignment  proeess  and  seeks  to  explore  the  possible  relationships  between  reeruit 
attributes  and  potential  indieators  of  suecess  in  the  assigned  MOS  field. 

This  researeh  aims  to  provide  the  Marine  Corps  with  a  better  understanding  of 
relationships  between  recruit  attributes  and  possible  indieators  of  suceess  in  a  particular 
MOS  in  order  better  select  the  right  reeruits  for  the  right  MOS.  Through  identifieation  of 
key  attributes  that  lead  to  sueeess,  the  USMC  can  modify  the  current  MOS  assignment 
proeess  in  order  utilize  the  right  human  capital  while  meeting  the  needs  of  the  Marine 
Corps.  More  speeifically,  entrance  eriteria  for  specific  MOSs  can  be  changed  or  validated 
to  ensure  the  Marines  with  the  highest  likelihood  of  sueeess  are  plaeed  in  the  appropriate 
MOS.  This  researeh  could  also  be  used  to  help  the  USMC  decide  how  to  alloeate  recruits 
to  specialties  to  meet  numerieal  targets  in  those  specialties. 
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B,  FOCUS  OF  THE  RESEARCH 

The  primary  focus  of  this  research  is  to  develop  a  research  concept,  data 
collection  plan,  and  repeatable  methodology  that  improve  M&RA’s  understanding  of 
relationships  between  recruit  attributes  and  their  success  within  the  assigned  MOS.  The 
end-state  goal  is  to  determine  the  entry-level  recruit  attributes  that  lead  to  the  most 
success  in  specific  MOSs  in  order  to  validate  or  recommend  change  to  the  current 
entrance  criteria  for  high-density  or  priority-fill  MOSs. 

This  study  focuses  specifically  on  the  0621  Field  Radio  Operator  MOS,  due  to  the 
stringency  of  the  entrance  requirements  for  this  MOS,  the  technicality  of  the 
requirements  necessary  to  perform  successfully  in  the  MOS,  and  existence  of  a 
significant  yearly  sample.  During  the  course  of  this  study,  statistical  models  are 
constructed  to  estimate  the  relationships  between  entry-level  attributes  and  two  measures 
of  perceived  success  within  the  0621  MOS.  The  models  are  based  on  a  set  of  variables  or 
attributes  that  are  available  through  a  USMC  Manpower  database  known  as  the  Total 
Force  Data  Warehouse  (TFDW). 

Our  investigation  is  organized  as  follows:  First,  we  conduct  exploratory  analysis 
of  the  data  to  identify  data  characteristics  and  relationships,  such  as  missing  or  invalid 
observations.  We  do  this  in  order  to  obtain  a  basic  understating  of  the  relationships 
between  variables.  Next,  we  use  linear  regression  to  construct  models  to  predict  two 
possible  dependent  variables;  time  to  achieve  the  pay  grade  of  E-4  and  the  USMC 
Computed  Tier  Score.  The  Computed  Tier  Score  is  a  quantitative  performance  metric  that 
provides  commanders  an  assessment  of  an  individual  Marine’s  performance  for  re¬ 
enlistment  eligibility.  Finally,  we  make  recommendations  for  future  study  that  will 
provide  the  most  benefit  to  the  career  field  assignment  process. 

This  study  answers  the  following  study  questions: 

1.  Do  significant  relationships  exist  between  entry-level  attributes  of  a 
USMC  recruit  and  the  USMC  Computed  Tier  Score  or  the  time  for  a 
Marine  to  achieve  the  pay  grade  of  E-4? 
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2.  What  are  the  most  influential  independent  variables  that  prediet  the 
Computed  Tier  Seore  and  the  time  to  promotion  to  E-4  in  a  partieular 
MOS  field? 

3.  What  insight  does  this  analysis  provide  in  terms  of  reeommending  ehanges 
to  the  eurrent  entranee  eriteria  for  the  0621  Field  Radio  Operator  MOS? 

4.  What  direetion  should  a  future  study  take  to  examine  ways  in  whieh  the 
matehing  of  USMC  reeruits  to  MOS  fields  ean  be  improved? 

C.  ORGANIZATION  OF  THIS  THESIS 

This  thesis  is  organized  as  follows.  In  Chapter  II,  we  review  literature  on  the 
eareer  assignment  proeess  in  the  USMC,  and  we  diseuss  the  methodologies  and  findings 
of  those  studies.  Additionally,  Chapter  II  provides  a  detailed  baekground  into  the  eurrent 
proeess  for  eareer  assignment  of  enlisted  Marines  and  an  overview  of  the  eurrent  MOS 
entranee  eriteria.  Chapter  III  deseribes  the  data  and  methodology  used  to  eonduet  this 
study.  It  ineludes  a  deseription  of  the  data  used  for  analysis  and  explanation  of  the  data 
eolleetion  and  eleansing  proeess.  Chapter  IV  diseusses  the  results  and  analysis  used  in 
order  to  aehieve  those  results.  Chapter  V  provides  eonelusions  of  this  study  and 
reeommendations  for  future  work. 
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II.  BACKGROUND 


A,  THE  PROCESS  FOR  CAREER  ASSIGNMENT  OF  ENLISTED  MARINES 

IN  THE  U.S.  MARINE  CORPS 

Each  branch  of  the  Armed  Services  uses  specific  entrance  eriteria  for  screening 
reeruits  and  assigning  them  to  an  MOS.  Headquarters  Marine  Corps  (HQMC)  Manpower 
and  Reserve  Affairs  (M&RA)  conducts  detailed  analyses  in  order  to  determine  the 
manning  requirements  for  eaeh  MOS  and  to  meet  the  current  needs  of  the  Marine  Corps. 
Based  on  these  manning  requirements,  recruits  are  then  assigned  into  the  required 
occupational  specialties  to  mateh  the  demand.  Prerequisites  for  entrance  into  specific 
MOS  fields  are  defined  in  the  Marine  Corps  Order  (MCO)  1200. 17E,  the  Military 
Oecupational  Speeialties  Manual  (Short  Title:  MOS  Manual)  (USMC,  2013).  The 
prerequisites  for  eaeh  MOS  were  originally  constructed  in  order  to  try  and  mateh  the  best 
recruit  to  occupational  field,  but  are  not  necessariliy  updated  when  the  job  specialties 
change. 

Traditionally,  the  Armed  Services  Voeational  Aptitude  Battery  (ASVAB) 
composite  test  seores  have  been  the  most  important  deliniating  factor  in  matching  an 
individual  to  MOS.  A  recruit’s  test  scores,  background  information  (citizenship,  security 
clearance  eligibility,  etc.),  preferences,  and  the  needs  of  the  Marine  Corps  are  eonsidered 
in  the  determination  of  MOS  assignments.  Marine  recruits  are  assigned  an  Intended  MOS 
(IMOS)  approximately  two  weeks  prior  to  graduating  basic  training.  They  are  then 
forwarded  to  their  assigned  MOS  school  for  initial  MOS  training.  Upon  graduating  from 
MOS  school,  each  Marine  is  offically  assigned  his  or  her  Primary  MOS  (PMOS) 
designator. 

B,  THE  ARMED  SERVICES  VOCATIONAL  APTITUDE  BATTERY 

(ASVAB)  AND  MOS  ENTRANCE  CRITERIA 

This  section  describes  the  Armed  Services  Voeational  Aptitude  Battery 
(ASVAB),  which  is  a  series  of  examinations  that  the  Armed  Forces  use  to  set 
requirements  to  enter  into  service  in  the  U.S.  military.  For  the  U.S.  Marine  Corps,  the 
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ASVAB  also  determines  entranee  requirements  that  must  be  met  in  order  to  be  assigned 
to  a  partieular  MOS. 

1.  ASVAB  Components 

The  U.S.  military  has  been  sereening  potential  reeruits  for  aptitude  sinee  World 
War  I.  In  1976,  all  military  serviees  began  using  the  ASVAB  for  both  screening  potential 
recruits  for  service  entrance  and  assigning  them  to  military  occupations.  Combining  the 
selection  and  classification  testing  into  one  exam  made  the  testing  process  more  efficient 
while  also  enabling  the  military  services  to  better  match  recruits  to  MOSs.  The  ASVAB 
has  been  revised  many  times  in  order  to  improve  inefficiencies  and  problems  with 
misnorming  (History  of  Miltary  Testing,  n.d). 

The  ASVAB  is  comprised  of  ten  subtests,  each  of  which  provides  its  own  score. 
There  are  two  versions  of  the  ASVAB,  a  paper  and  pencil  (P&P)  version  and  a 
computerized  adaptive  test  (CAT)  version.  The  P&P-ASVAB  combines  two  of  the 
subtests.  Auto  Information  (AI)  and  Shop  Information  (SI),  into  one  single  test.  Auto  and 
Shop  Information  (AR).  These  subtests  are  displayed  in  Table  1  (ASVAB  Fact  Sheet, 
n.d.). 

Possible  recruits  are  screened  for  entrance  into  the  military  by  calculating  a 
composite  score,  called  the  Armed  Forces  Qualification  Test  (AFQT).  The  AFQT  is  a 
composite  score  that  incorporates  the  following  four  ASVAB  subtests:  Paragraph 
Comprehension  (PC),  Word  Knowledge  (WK),  Mathematics  Knowledge  (MK),  and 
Arithmetic  Reasoning  (AR).  The  AFQT  score  is  reported  as  a  percentile  between  1-99, 
which  indicates  the  percentage  of  examinees  that  scored  at  or  below  the  percentile  score 
(ASVAB  Scoring,  n.d.).  The  current  minimum  AFQT  score  for  entrance  into  the  USMC 
is  32  for  high  school  graduates  and  50  for  persons  with  a  GED  (ASVAB  Scoring,  n.d.). 
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Table  1.  ASVAB  subtests  (after  ASVAB  Faet  Sheet,  n.d.) 


Test 

Description 

General  Seienee  (GS) 

Knowledge  of  physical  and  biological 
sciences 

Arithmetic  Reasoning  (AR) 

Ability  to  solve  arithmetic  word  problems 

Word  Knowledge  (WK) 

Ability  to  select  the  correct  meaning  of  a 
word  presented  in  context  and  to  identify 
best  synonym  for  a  given  word 

Paragraph  Comprehension  (PC) 

Ability  to  obtain  information  from  written 
passages 

Mathematics  Knowledge  (MK) 

Knowledge  of  high  school  mathematics 
principles 

Electronic  Information  (El) 

Knowledge  of  high  school  mathematics 
principles 

Auto  Information  (AI) 

Knowledge  of  automobile  technology 

Shop  Information  (SI) 

Knowledge  of  tools  and  shop  terminology 
and  practices 

Mechanical  Comprehension  (MC) 

Knowledge  of  mechanical  and  physical 
principles 

Assembling  Objects  (AO) 

Ability  to  determine  how  an  object  will 
look  when  its  parts  are  put  together 
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2. 


MOS  Entrance  Criteria 


The  Marine  Corps  uses  four  other  composite  scores  derived  from  the  ASVAB 
subtest  scores  for  determining  entrance  or  assignment  for  recruits  into  occupational 
specialties.  The  four  USMC  composite  scores  are  General  Technical  (GT),  Mechanical 
Maintenance  (MM),  Electronics  (EL),  and  Clerical/ Administrative  (CL).  Each  composite 
score  is  formulated  from  a  combination  of  various  ASVAB  subtest  scores  (USMC,  2009). 
The  composite  scores  and  their  derivations  are  shown  in  Table  2. 


Table  2.  U.S.  Marine  Corps  ASVAB  composite  scores 
(after  Classification  Testing,  2009) 


Composite  Scores 

Score  Derivation 

General  Technical  (GT) 

WK  +  PC  +  AR  +  MC 

Mechanical  Maintenance  (MM) 

AR  +  El  +  MC  +  AS 

Electronics  (EL) 

AR  +  MK  +  El  +  GS 

Clerical/  Administration  (CL) 

WK  +  PC  +  MK 

The  U.S.  Marine  Corps  assigns  recruits  to  a  particular  MOS  based  on  specific 
entrance  criteria  or  prerequisite  requirements.  These  entrance  criteria  vary  by  MOS,  and 
are  set  to  best  match  recruits  with  the  right  skill  sets,  knowledge  base,  physical  ability, 
and  aptitude  levels  to  a  corresponding  MOS.  The  job  descriptions,  prerequisite 
requirements,  and  MOS  requirements  for  each  MOS  are  outlined  in  MCO  1200. 17E 
Military  Occupational  Specialties  Manual  (Short  Title:  MOS  Manual)  (USMC,  2013). 
Descriptions,  prerequisites,  and  requirements  for  the  0621  MOS  (Field  Radio  Operator) 
and  the  0311  MOS  (Rifleman)  are  outlined  by  the  MOS  Manual  as  follows: 
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MOS  0621,  Field  Radio  Operator  PMOS 


a.  MOS  Description:  Field  Radio  Operators  employ  radios  to  send  and  receive 
messages.  Typical  duties  include  the  set  up  and  tuning  of  radio  equipment 
including  antennas  and  power  sources;  establishing  contact  with  distant  stations; 
processing  and  logging  of  messages;  making  changes  to  frequencies  or 
cryptographic  codes;  and  maintaining  equipment  at  the  first  echelon.  Skill 
progression  training  for  Sergeant  and  Corporal  is  Radio  Supervisors  Course. 

b.  Prerequisites 

(1)  Must  be  a  U.S.  Citizen. 

(2)  Must  possess  an  EL  score  of  105  or  higher. 

(3)  Must  possess  a  valid  state  driver's  license. 

(4)  Security  requirement:  Secret  security  clearance  eligibility. 

c.  Requirements.  Complete  the  Field  Radio  Operator  (FROC)  Course  (after 
USMC,  2013). 


MOS  0311,  Rifleman  PMOS 

a.  MOS  Description:  The  Riflemen  employ  the  modern  service  rifle/carbine,  the 
M203  grenade  launcher  and  the  squad  automatic  weapon  (SAW).  Riflemen  are 
the  primary  scouts,  assault  troops,  and  close  combat  forces  available  to  the  Marine 
Corps  Air  Ground  Task  Force  (MAGTF).  They  are  the  foundation  of  the  Marine 
infantry  organization,  and  as  such  are  the  nucleus  of  the  fire  team  in  the  rifle 
squad,  the  scout  team  in  the  FAR  squad,  scout  snipers  in  the  infantry  battalion, 
and  reconnaissance  or  assault  team  in  the  reconnaissance  units.  Noncommissioned 
Officers  are  assigned  as  fire  team  leaders,  scout  team  leaders,  rifle  squad  leaders, 
or  rifle  platoon  guides. 

b.  Prerequisites.  Must  possess  a  GT  score  of  80  or  higher. 

c.  Requirements.  Complete  the  Marine  Rifleman  Course  at  the  School  of  Infantry 
(after  USMC,  2013). 


These  two  MOS  descriptions  are  provided  to  emphasize  that  each  USMC  MOS 
has  different  job  descriptions,  prerequisites,  and  requirements.  For  the  purposes  of  this 
study,  it  is  important  to  note  the  prerequisite  requirements  for  entrance  into  a  specific 
MOS.  These  prerequisites  are  the  criteria  that  the  USMC  uses  to  classify  a  recruit  into  an 
MOS. 
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C.  LITERATURE  REVIEW 

This  section  reviews  previously  conducted  studies  on  career  assignment  and 
related  subjects  that  are  of  interest  to  the  manpower  community.  More  specifically, 
studies  in  the  following  areas  are  reviewed:  military  career  assignment  and  the 
relationship  between  ASVAB  testing  and  performance  in  an  MOS. 

1.  Previous  Studies  on  Career  Assignment 

Rautio  (2011)  examines  standards  used  to  screen  recruits  for  assignment  to  the 
communications  field  in  the  USMC.  He  discusses  the  relationship  between  ASVAB 
composite  scores  and  success  measures  at  the  communications  occupational  field 
schools.  The  data  used  for  analysis  covers  9,921  Marines  from  fiscal  year  2006  through 
fiscal  year  2009.  The  author  develops  multivariate  probit  regression  models  that  include 
all  four  years  of  data  encompassing  multiple  MOS  fields.  The  probit  models  determine 
the  effects  of  ASVAB  composite  scores  and  other  measures  of  performance  on  success  at 
the  communications  schools  (Rautio,  2011). 

Rautio  (2011)  considers  models  that  use  the  following  predictor  variables: 
Gender,  Race,  Ethnicity,  Marital  Status,  Number  of  Dependents,  Primary  MOS,  Fiscal 
Year,  Armed  Forces  Qualification  Test  (AFQT)  Score,  ASVAB  composite  scores. 
Education  Fevel,  Proficiency  Score,  and  Conduct  Score.  The  dependent  variable 
identifies  whether  a  Marine  successfully  completed  the  initial  communications  MOS 
school.  Rautio  (2011)  finds  that  the  ASVAB  Electronic  composite  score  (EE  Score)  has  a 
significantly  positive  effect  on  the  probability  of  success  at  the  communications  schools. 
The  author  also  cites  other  variables  that  have  a  positive  effect  on  the  probability  of 
success  such  as  marital  status,  ethnicity,  and  the  ASVAB  Clerical  composite  test  score. 
He  also  finds  that  gender  and  education  level  are  statistically  significant  contributors  to 
the  prediction  of  success. 
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2. 


Studies  on  the  Relationship  between  ASVAB  Testing  and 
Performance  in  an  MOS 


The  Center  for  Naval  Analyses  (CNA)  conducted  a  multi-year  study  (Carey, 
1993)  for  the  Marine  Corps  Job  Performance  Measurement  (JPM)  project  in  order  to 
construct  valid  measures  for  job  performance  and  to  determine  the  relationship  between 
the  ASVAB  and  Marine  job  performance.  The  study  was  conducted  due  to  concern  by 
Congress  that  a  significant  number  of  unqualified  and  low  aptitude  personnel  had  entered 
into  military  service  during  the  1970s.  This  concern  was  supported  by  CNA  studies  that 
discovered  a  misnorming  of  the  ASVAB  that  resulted  in  360,000  recruits  entering  into 
service  that  would  have  been  declared  ineligible  if  the  ASVAB  test  scores  had  been 
accurate.  In  1981,  Congress  mandated  that  each  service  perform  a  Job  Performance 
Measurement  (JPM)  project  in  order  to  relate  ASVAB  scores  to  on-the-job  performance 
(Carey,  1993).  This  study  develops  new  measures  for  performance  and  success  in  order 
to  study  the  relationships  to  predictor  variables. 

CNA  executed  two  phases  of  the  study  between  1986  and  1990.  The  first  phase, 
(1986  to  1987)  focuses  on  job  performance  measurement  for  infantry  MOSs.  The  second 
phase  (1990)  focuses  on  job  performance  measures  for  the  mechanical  maintenance  field 
(Carey,  1993).  Our  discussion  is  limited  to  the  infantry  MOS  phase  of  the  study. 

The  infantry  MOS  study  maps  the  job  duties  of  five  infantry  MOSs  based  on  the 
Marine  Corps  Individual  Training  Standards  (ITS),  now  included  in  the  USMC  Training 
and  Readiness  (T&R)  Manual,  for  infantry  occupations.  The  study  proposes  job 
performance  measures,  hands-on  performance  tests  (HOPTs)  and  job  knowledge  tests 
(JKTs)  that  were  developed  to  directly  test  job  duties  as  outlined  by  the  ITS.  Carey 
(1993)  finds  that  the  HOPTs  proposed  by  CNA  were  effective  measures  of  job 
performance  due  to  their  strong  agreement  with  actual  job  performance  based  on  the 
requirement  that  an  examinee  perform  job-related  tasks  under  realistic  but  standardized 
conditions.  The  JKTs  are  designed  to  be  a  parallel  test  to  the  HOPTs  and  include  written 
exam  knowledge  testing  of  items  related  to  job  performance.  This  study  notes  that 
standardized  HOPTs  are  expensive  and  difficult  to  develop  and  implement.  CNA 
concludes  that  while  the  HOPTs  should  serve  as  the  benchmark  for  measuring  job 
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performance,  JKTs  provide  promising  replacements  for  setting  enlistment  standards. 
Marine  Corps  Proficiency  marks  (PRO  marks)  are  also  considered  in  the  study  but  found 
to  provide  less  fidelity  to  actual  job  performance  when  compared  to  the  HOPTs  and  JKTs 
(Carey,  1993). 

In  an  earlier  CNA  study,  Mayberry  (1990)  investigates  the  relationship  between 
these  JPMs  and  ASVAB  composite  scores,  with  particular  focus  on  the  General 
Technical  (GT)  composite  score.  Mayberry  focuses  primarily  on  the  GT  score  because 
the  Marine  Corps  uses  this  score  to  determine  eligibility  for  the  infantry  occupational 
field. 

More  than  2,300  infantrymen  from  five  infantry  MOSs  were  tested  over  the 
course  of  two  days.  Examinees  were  administered  both  the  JKTs  and  the  HOPTs.  The 
results  from  the  performance  testing  are  then  modelled  in  order  to  determine  if 
relationships  exist  between  aptitude,  as  indicated  by  the  ASVAB  composite  scores,  and 
MOS  performance. 

Mayberry  (1990)  finds  a  strong  relationship  between  individual  aptitude  level  and 
later  performance  of  critical  MOS  tasks.  This  study  provides  a  useful  measure  of  MOS 
performance  and  determines  that  the  ASVAB  composite  scores  provide  significant 
indicators  of  performance  within  an  MOS. 

D,  CHAPTER  SUMMARY 

The  U.S.  Marine  Corps  MOS  Assignment  Process  attempts  to  assign  the  most 
qualified  recruits  with  the  most  potential  for  success  to  the  right  MOS.  The  USMC  uses 
entrance  criteria  to  assign  those  recruits  to  an  MOS  while  meeting  the  needs  of  the 
Marine  Corps.  Based  on  the  literature  reviewed,  ASVAB  composite  scores  lend  well  in 
predicting  success  during  MOS  school  and  within  the  assigned  MOS.  Additionally,  these 
studies  suggest  that  the  EL  composite  score  is  good  predictor  of  performance  in  the  0621 
Eield  Radio  Operator  MOS.  Our  study  focuses  on  predicting  success  or  performance  in  a 
specific  MOS  while  in  the  operating  forces. 
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III.  DATA  AND  METHODOLOGY 


A,  THE  DATA 

This  section  gives  a  detailed  explanation  of  the  data  collection  process,  the 
original  data  gathered  for  study  purposes,  and  the  preparation  of  the  data  in  order  to 
conduct  a  useful  analysis. 

1,  Data  Summary 

The  data  used  in  our  research  is  obtained  from  the  USMC’s  Total  Force  Data 
Warehouse  (TFDW).  TFDW  is  a  database  of  personnel  records  for  Manpower  &  Reserve 
Affairs.  TFDW  contains  historical  information  for  active  duty  and  reserve  Marines  in  the 
USMC.  For  the  purposes  of  this  study,  data  is  pulled  from  TFDW  for  all  active  duty 
enlisted  Marines  with  the  0621  MOS  designator  that  entered  into  active  service  during 
the  Fiscal  Years  of  2008  through  2010,  or  from  1  October,  2007  through  30  September, 
2010. 

The  data  includes  personal  and  professional  information  including  physical 
characteristics,  physical  fitness  performance  scores,  education  information, 
demographics,  waivers  received,  ASVAB  test  scores,  promotions,  marksmanship  scores, 
and  legal  information.  The  data  provides  a  snapshot  in  time  of  the  Marine’s  career  profile 
that  is  updated  when  there  is  a  change  to  the  information,  while  other  data  fields  are 
populated  each  month.  Lastly,  there  are  data  fields  that  are  populated  only  once,  such  as 
information  gathered  upon  entering  service.  Table  3  gives  details  on  the  initial  sample 
obtained  for  this  study.  Duplicate  observations  were  removed  and  determined  to  be 
present  due  to  changes  in  enlistment  dates,  but  do  not  affect  other  fields. 
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Table  3.  Summary  of  Marines  entering  serviee  in  FY2008-FY2010 

for  the  0621  MOS 


Fiscal  Year 

Observations  in 
Original  Sample 

Sample  with 
duplicates  removed 

FY2008 

429 

377 

FY2009 

433 

384 

FY2010 

510 

466 

2,  Data  Formatting  and  Cleaning 

This  seetion  explains  the  proeedures  taken  to  prepare  the  data  for  analysis 
including  an  explanation  of  the  observations  that  were  removed  from  the  analysis  and  the 
grouping  of  categorical  variables. 

a.  Observation  Removal,  Variable  Substitution,  and  Censoring 

In  order  to  properly  build  relevant  analytical  models,  the  historical  information  for 
each  Marine’s  record  should  contain  complete  information  for  each  of  the  predictor 
variables.  When  missing  or  invalid  information  exists  for  a  predictor  variable,  we  remove 
those  records  from  the  analysis. 

When  there  are  missing  values  for  the  dependent  variables  included  in  the  study, 
we  first  determine  if  there  is  a  valid  reason  and  possible  substitution  value  for  the 
variable.  If  no  valid  substitute  exists,  the  records  with  missing  values  are  removed. 
Reasons  for  missing  values  include  Marines  separated  from  service  prior  to  the 
conclusion  of  their  enlistment  and  deployment  waivers.  These  exclusions  are  shown  in 
Table  4. 

The  first  dependent  variable  considered,  the  Computed  Tier  Score  is  calculated  as 
a  combination  of  seven  sub-variables.  The  Computed  Tier  Score  is  discussed  in  greater 
detail  later  in  this  chapter.  One  of  the  sub-variables  of  the  Computed  Tier  Score,  martial 
arts  belt  level,  contains  missing  values  in  the  data  for  2008  and  2009.  We  decided  to 
replace  these  missing  values  with  the  belt  level  closest  to  the  median  belt  level  of  all 
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records.  We  assume  that  each  Marine  has  received,  at  minimum,  the  median  belt  level 
due  to  USMC  requirements  to  achieve  certain  belt  levels  during  career  progression. 
Additionally,  this  assumption  has  minimal  impact  on  the  overall  values  of  Computed  Tier 
Score,  but  allows  us  to  include  more  observations  for  analysis.  The  number  of 
observations  with  a  substitute  value  for  martial  arts  belt  level  is  included  in  Table  4. 
Fiscal  Year  2010  did  not  contain  any  observations  that  required  substitution  for  martial 
arts  belt  level. 

The  second  dependent  variable  included  in  the  analysis  is  the  time  in  days  that  it 
takes  for  a  Marine  to  promote  to  the  pay  grade  of  E-4,  or  time2E4.  After  removal  of 
records  for  Marines  separated  from  active  service,  missing  values  still  exist  for  Marines 
that  are  not  promoted  to  E-4  prior  to  completion  of  their  first  four  years  of  active  service. 
Although  these  Marines  never  achieved  the  pay  grade  of  E-4  during  their  period  of 
observation  in  the  study,  a  censored  value  is  substituted  for  these  records  for  time2E4  in 
order  to  retain  these  important  observations  for  study.  The  censored  time2E4  value  is 
equal  to  one  plus  the  maximum  observed  time  for  the  cases  that  we  considered,  which  is 
1570  days.  Twenty  records  from  the  EY2010  data  (approximately  five  percent  of  the  total 
number  of  records)  have  time2E4  set  to  this  censoring  value.  Eigure  1  shows  a  histogram 
of  time2E4  for  the  EY2010  data,  in  which  the  twenty  censored  values  are  apparent  at  the 
far  right.  These  values  would  be  a  continuation  of  the  right-hand  tail  if  the  values  were 
not  censored.  The  numbers  of  observations  with  censored  values  for  time2E4  for  each 
year  of  study  are  shown  in  Table  4. 
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Histogram  of  time2E4  for  2010  data 
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Figure  1 .  Histogram  of  the  number  of  days  to  promotion  to  the  pay  grade 

of  E-4  (time2E4)  for  FY2010  entries  to  the  0621  MOS 


Table  4.  Summary  of  data  formatting  and  cleaning 


Fiscal 

Year 

Total 

Observations 

Observations 
Removed  due 
to  separation 
from  service 

Observations 
used  in 
Analysis 

Observations 

with 

substitute 
value  for 
martial  arts 
belt  level 

Observations 

with 

censored 

time2E4 

2008 

377 

26 

351 

23 

32 

2009 

384 

30 

354 

14 

25 

2010 

466 

45 

421 

0 

20 

With  the  removal  of  observations  from  the  data  set  as  described  above,  the 
remaining  data  set  consists  of  1,126  Marines  across  three  fiscal  years. 
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b.  Grouping  of  Categorical  Data 

The  categorical  variables  considered  for  analysis  were  screened  to  ensure  they 
contained  a  sufficient  number  of  different  categories  for  use  as  potential  predictor 
variables. 


3.  Assumptions  and  Limitations  of  the  Data 

One  of  the  purposes  of  this  study  is  to  analyze  the  process  of  career  assignment  in 
order  provide  valuable  recommendations  for  future  occupational  classification.  It  is  the 
intention  of  this  study  that  the  modeling  techniques  and  recommendations  be  suitable  for 
the  current  manpower  selection  and  assignment  process  in  the  USMC.  Therefore,  we  seek 
to  develop  modeling  techniques  and  supporting  methodology  that  can  be  applied  to  a 
broad  range  of  MOSs,  particularly  the  high-density  MOSs,  in  order  to  gain  a  better 
understanding  of  the  overall  picture  of  career  placement.  This  study  focuses  on  the  0621 
Field  Radio  Operator  MOS.  We  do  not  consider  an  optimization  problem  placing  recruits 
into  the  various  MOSs;  instead,  we  focus  on  the  entry  attributes  that  may  indicate  a 
successful  match  with  an  occupation. 

The  Marine  Corps  uses  the  Computed  Tier  Score  for  re-enlistment  purposes.  The 
Computed  Tier  Score  is  a  quantitative  measurement  for  re-enlistment  eligibility.  This 
study  does  not  attempt  to  evaluate  the  validity  of  the  Computed  Tier  Score  as  a  measure 
of  performance  or  re-enlistment  suitability. 

B,  VARIABLE  DESCRIPTIONS 

This  section  provides  descriptions  of  the  variables  considered  for  analysis.  All 
variables  that  have  potential  for  correlation  with  success,  re-enlistment,  and  MOS 
suitability  are  included.  Additionally,  only  those  variables  obtainable  through  TFDW  are 
analyzed. 

1,  Dependent  Variables 

The  dependent  variables  considered  for  analysis  are  the  USMC  Computed  Tier 
Score  and  time  (in  days)  to  promote  to  the  pay  grade  of  E-4,  or  Corporal,  in  the  USMC. 
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a.  Computed  Tier  Score 

The  USMC  uses  two  measures  for  determining  eligibility  for  re-enlistment,  the 
Computed  Tier  Score  and  the  Commander’s  Tier  Recommendation.  The  Computed  Tier 
Score  was  originally  introduced  in  May  2011  through  MARADMIN  273/11.  It  was 
created  in  order  to  provide  commanders  a  quantitative  assessment  of  an  individual 
Marine’s  performance.  The  Computed  Tier  Score  is  calculated  using  the  scores  from  a 
Marine’s  physical  fitness  test  (PFT),  combat  fitness  test  (CFT),  proficiency  and  conduct 
markings,  and  the  rifle  range  qualification  score.  Additionally,  points  are  awarded  for 
USMC  martial  arts  belt  level  and  for  meritorious  promotions  to  the  current  rank.  The 
Computed  Tier  Score  is  then  compared  to  all  Marines  within  the  same  MOS  that  are 
eligible  for  re-enlistment  during  the  same  fiscal  year.  An  example  of  a  Marine  Corps  Tier 
Worksheet  is  shown  in  Figure  2. 


CPL  1.  M.  MARINE 

PMOS0621 

Ev»nt 

MOS  Ava 

SNM’s  Scores 

PFT 

2*e 

274 

CFT 

282 

264 

Profioeoey 

430 

430 

Conduct 

430 

430 

Rrfl* 

203 

303 

MCMAP 

MMB  -  Tin  Bolt 

MMO-Gr*«nB«« 

Montonout  Promotion 

NA 

0 

^  1691 

1751 

repa/  History 

IXD£ 

Dat» 

0NJP(sl 

NA 

NA 

Ti»f  Chart 

T»rM10%)  91%-100% 

T*fll(30\)  61%-90% 

Tier  111(50%)  n%-60% 

X 

^8^ 

Tier  IV  (10%) 

Figure  2.  USMC  Tier  Worksheet  (after  GySgt  B.  Lodge,  USMC,  Personal 

Communication,  September  10,  2014). 
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The  raw  scores  from  the  PFT,  CFT,  and  Rifle  Qualification  are  not  weighted  or 
altered  in  the  calculation  of  the  Computed  Tier  Score.  The  Proficiency  and  Conduct 
markings  are  multiplied  by  100,  and  each  Marine  Corps  Martial  Arts  Program  (MCMAP) 
belt  level  is  associated  with  a  specific  point  value  when  added  to  the  total  score.  Finally, 
Marines  that  have  been  meritoriously  promoted  to  their  current  rank  receive  an  additional 
100  point  bonus,  as  long  as  they  have  no  misconduct  on  their  record  within  the  previous 
six  months  of  promotion  (GySgt  B.  Lodge,  USMC,  Personal  Communication,  September 
10,  2014).  These  point  values  are  then  summed  together  for  the  final  calculation  of  the 
Computed  Tier  Score.  As  seen  in  Figure  2,  Marines  are  then  evaluated  against  their  re¬ 
enlistment  cohort  and  placed  into  Tiers  1-4,  based  on  their  respective  percentile.  For  the 
purposes  of  this  study,  we  use  the  non-categorized  Computed  Tier  Score  as  a  quantitative 
variable  for  analysis. 

In  order  to  calculate  the  Computed  Tier  Score  for  each  Marine  or  observation  in 
the  study,  we  capture  the  data  for  each  component  and  generate  the  score  using  the 
aforementioned  algorithm.  All  data  captured  for  the  computation  of  the  Computed  Tier 
Scores  are  taken  on  July  1  of  the  fiscal  year  prior  to  a  Marine’s  end  of  active  service 
(FAS).  This  data  is  chosen  because  it  marks  the  first  day  that  Marines  can  apply  for  re¬ 
enlistment,  and  mirrors  the  process  that  the  USMC  uses  to  offer  re-enlistment. 

b.  Time  to  Achieve  E-4 

The  second  dependent  variable  we  consider  is  time  (in  days)  to  achieve  the  pay 
grade  of  E-4,  or  Corporal,  in  the  USMC.  We  choose  this  metric  due  to  the  high 
significance  of  achieving  this  rank  in  the  USMC  and  the  possible  correlation  to 
performance  within  a  Marine’s  specific  MOS.  This  variable  is  referred  to  as  time2E4  in 
the  regression  model  outputs  used  in  the  analysis. 

2,  Independent  Variables 

Table  5  contains  a  list  of  all  independent  variables  that  were  considered  in  this 

study. 
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Table  5.  Description  of  independent  variables  used  in  analysis 


Variable  Name 

Type 

Description 

AGE 

Numerical 

Age  of  Marine  upon  entering 
service 

GENDER 

Categorical 

Gender  of  Marine 

HEIGHT 

Numerical 

Height  upon  entering  service 

WEIGHT 

Numerical 

Weight  upon  entering  service 

ISTCRUNCHES 

Numerical 

Number  of  crunches  for  Initial 

Skills  Test 

ISTRUN 

Numerical 

Run  time  (in  seconds)  for  1 .5  Mile 
run  for  Initial  Skills  Test 

RIFLESCORE 

Numerical 

Initial  Rifle  Score  during  Basic 
Training 

WAIVTRAFFIC 

Binary 

Received  waiver  for  having  a  traffic 
related  offense  prior  to  service 

WAIV_MINOR.NONTRAFF 

Binary 

Received  waiver  for  a  minor-non 
traffic  related  offense  prior  to 
service 

WAIVMISCOND 

Binary 

Received  waiver  for  a  misconduct 
offense  prior  to  service 

WAIVDRUGSUBST 

Binary 

Received  waiver  for  Drug  or 
Substance  usage  prior  to  service 

WAIVWEIGHT 

Binary 

Received  waiver  for  being  over 
weight  requirement  prior  to  service 

WAIV_ICD9 

Binary 

Received  waiver  for  Medical 

reasons 

WAIVOTHER 

Binary 

Received  waiver  for  other  reasons 
not  captured 

GS 

Numerical 

ASVAB  GS  subscore 

MK 

Numerical 

ASVAB  MK  subscore 

PC 

Numerical 

ASVAB  PC  subscore 

AR 

Numerical 

ASVAB  AR  subscore 

AS 

Numerical 

ASVAB  AS  subscore 

WK 

Numerical 

ASVAB  WK  subscore 

MC 

Numerical 

ASVAB  MC  subscore 

El 

Numerical 

ASVAB  El  subscore 

GT  SCORE 

Numerical 

ASVAB  GT  composite  score 

MM  SCORE 

Numerical 

ASVAB  MM  composite  score 

CLSCORE 

Numerical 

ASVAB  CL  composite  score 

ELSCORE 

Numerical 

ASVAB  EL  composite  score 
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C.  METHODOLOGY 

This  section  explains  the  techniques  used  to  conduct  the  statistical  analyses, 
variable  transformations,  variable  selection  methods,  and  model  validation  techniques. 
The  following  concepts  are  basic  to  fitting  linear  models  and  the  reader  is  referred  to  a 
reference  such  as  (Faraway,  2005)  for  further  statistical  understanding. 

1.  Multivariate  Linear  Regression 

In  order  to  address  our  study  questions,  we  use  statistical  models  to  determine  if  a 
significant  relationship  exists  between  the  independent  variables  and  the  dependent 
(response)  variable.  The  response  variables  in  this  study  are  continuous,  and  are  analyzed 
separately  against  the  independent  variables.  Multivariate  linear  regression  models  are 
used  for  explaining  the  relationship  between  a  single  dependent  variable  Y ,  commonly 
called  the  response,  and  multiple  independent  variables  (predictors),  Xj,...,X^  (Faraway, 

2005,  p.  6). 

In  a  linear  regression  model,  the  continuous  response  variable  Y  is  modeled  in 
terms  of  p  independent  variables X  =|xj  X2,...,x^|  .  The  general  form  for  a  multivariate 
linear  regression  model  is: 

Y  =  ^0+  + 

where  ^={^0,^1,...,^^}  are  unknown  parameters,  or  coefficients,  that  are  associated 
with  the  independent  variables.  is  the  intercept  term,  and  s  is  the  prediction  error,  or 
random  error  term  that  has  no  relationship  to  X  (Faraway,  2005,  p.  1 1). 

2.  Variable  Transformation 

Transforming  the  dependent  or  independent  variables  can  often  improve  the  fit  of 
a  model  and  correct  violations  of  model  assumptions.  It  is  important  to  explore  the 
possibility  of  improving  a  model  by  transforming  the  variables  included,  particularly  the 
dependent  variable.  While  transforming  the  variables  used  in  analysis  may  make  the 
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results  difficult  to  interpret  upon  initial  inspection,  it  can  provide  a  better  model  fit 
(Faraway,  2005). 

The  Box-Cox  transformation  family  is  used  in  our  study  to  determine  an 
appropriate  transformation  of  the  response  variable.  The  Box-Cox  family  transforms  the 
independent  variable  y->  g;^(y)  where  the  transformation  indexed  by  A  is  as  follows 
(Faraway,  2005,  pp.  110-11 1): 


gziy) 


when 


g^(y)  =  {log(;i),  when!  =  0 


The  best  values  of  A  and  the  regression  parameters  are  determined  using  maximum 
likelihood. 


3,  Variable  Selection 


In  developing  statistical  models,  it  is  important  to  consider  variable  selection  in 
order  to  determine  the  best  subset  of  independent  variables  to  be  included  the  model. 
Introducing  too  many  independent  variables  (“overfitting”)  reduces  the  overall  predictive 
power  of  the  model.  In  order  to  find  the  best  set  of  independent  variables  for  analysis  and 
to  reduce  the  possibility  of  overfitting,  we  use  Best  Subsets  Regression  (Faraway,  2005, 
pp.  127-128).  Best  Subsets  Regression  finds  the  best  set  of  predictors  for  a  given  subset 
size,  and  then  chooses  the  subset  size  to  optimize  a  criterion  such  as  adjusted  7?^. 
Adjusted  is  defined  as  follows  (Faraway,  2005,  p.  127): 


R.  =1 


RSS/in-p) 

TSS/in-1) 


,  where 


RSS  =  residual  sum  of  squares  =  Z(y-yf 
TSS  =  total  sum  of  squares  =  I(y-y)’- 


where,  n  is  the  number  of  observation  in  the  data  set,  and  p  is  the  number  of  predictor 
variables  in  the  initial  model. 
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Cross  validation  can  be  used  to  seleet  the  subset  size.  This  is  done  by  randomly 
selecting  a  given  pereentage  of  the  data  (e.g.  ten  pereent),  fitting  the  model  on  the 
remaining  set  of  data,  and  then  ealeulating  the  sum  of  squares  for  using  the  model  to 
prediet  the  first  set.  This  proeedure  ean  be  repeated  many  times  to  obtain  a  better 
estimate  of  how  well  the  model  is  able  to  prediet  new  data.  The  number  of  predietor 
variables  used  in  the  model  is  seleeted  to  minimize  the  estimated  mean  squared 
predietion  error.  We  use  Best  Subsets  Regression  with  eross-validation,  taking  out  a 
randomly  selected  subset  of  ten  pereent  of  the  observations  eaeh  time  for  use  as  a  test  set, 
repeating  this  proeedure  ten  times. 

4,  Regression  with  a  Censored  Outcome  Variable 

We  eonsider  regression  using  the  number  of  days  for  a  Marine  to  be  promoted  to 
the  pay  grade  of  E-4  (time2E4)  as  an  outcome  variable.  As  we  discussed  in  section 
A(2)(a)  above,  in  a  number  of  cases  the  Marine  did  not  aehieve  this  promotion  in  the 
observable  time  period.  These  eases  are  “right  eensored”  with  the  maximum  observable 
time  used  to  represent  these  values.  Their  aetual  promotion  times  are  greater  than  the 
eensored  values.  A  regression  model  with  eensored  values  in  the  outeome  variable  ean 
be  estimated  taking  eensoring  into  aeeount.  We  use  the  survreg  funetion  in  the  survival 
paekage  in  R  to  fit  these  models.  Beeause  diagnostie  tools  are  mueh  better  developed  for 
uneensored  regression,  we  use  uneensored  regression  first  and  then  eompare  the  results  to 
those  obtained  using  the  survreg  funetion. 

5,  Model  Validation 

It  is  important  to  validate  a  statistieal  model  to  ensure  that  the  model  provides 
meaningful  results.  This  seetion  explains  the  teehniques  used  to  validate  the  linear 
regression  models. 

The  validity  of  the  regression  model  depends  on  adherenee  to  several  key 
assumptions.  These  model  assumptions  need  to  be  validated  using  regression 
diagnosties.  The  model  assumptions  are  listed  as  follows: 
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1.  The  errors  are  independent,  exhibit  eonstant  varianee,  and  are  normally 
distributed. 

2.  The  struetural  part  of  the  model  is  eorreet. 

3.  Unusual  observations  are  not  overly  influential  in  the  model  (Faraway, 
2005,  p.  53). 

The  regression  diagnostics  are  conducted  using  a  set  of  diagnostic  plots  that 
allows  for  examination  of  these  model  assumptions. 

6,  Software  Used  for  Analysis 

The  R  programming  language  is  used  (R  Development  Core  Team,  2014)  for  the 
analyses  performed  in  this  study. 

D,  CHAPTER  SUMMARY 

This  chapter  provides  a  detailed  explanation  of  the  data  and  methodologies  used 
in  order  to  conduct  this  analysis.  Data  formatting,  observation  removal,  and  data  cleaning 
procedures  are  utilized  in  order  to  prepare  the  data  for  viable  statistical  modeling.  The 
independent  and  dependent  variables  for  consideration  are  modeled  using  multivariate 
linear  regression  while  considering  necessary  variable  transformations.  Finally,  the 
variable  selection  methods  and  model  validation  techniques  are  outlined  for  use  in 
directing  this  analysis. 
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IV.  RESULTS  AND  ANALYSIS 


We  present  the  results  of  fitting  the  statistieal  models  that  are  deseribed  in 
Chapter  III.  Two  response  variables  are  eonsidered  separately:  the  Computed  Tier  Seore 
ealeulated  near  the  time  that  Marines  are  eligible  for  re-enlistment  (about  2.5  years  into 
the  initial  enlistment);  and,  the  number  of  days  required  for  an  enlistee  to  make 
promotion  to  the  pay  grade  of  E-4.  These  response  variables  are  taken  as  measures  of 
sueeess  of  an  enlistee’s  plaeement  in  the  0621  MOS.  For  both  response  variables,  we  use 
data  on  USMC  first  enlistments  in  the  0621  MOS  for  FY2010.  This  is  the  most  reeent 
data  available  to  us,  and  we  also  have  found  it  to  be  the  most  reliable.  We  also  explore 
using  a  multi-year  model  that  ineludes  data  on  all  entries  from  FY2008  to  FY2010. 

A.  COMPUTED  TIER  SCORE  ANALYSIS 

In  this  seetion  we  present  the  results  of  fitting  a  regression  model  to  prediet  the 
Computed  Tier  Seore  from  a  set  of  explanatory  variables  obtained  at  the  initial  point  of 
enlistment,  using  data  for  FY2010  entries.  The  independent  variables  used  in  all 
regression  analyses  are  deseribed  in  Table  5. 

1.  Initial  Variable  Relationship  Exploration 

Figure  3  shows  a  series  of  plots  that  provide  an  initial  look  at  the  nature  of  the 
relationships  between  eaeh  independent  variable  and  the  Computed  Tier  Seore.  The  red 
line  in  eaeh  plot  is  a  regression  trend  line  that  deseribes  the  mean-relationship  between 
the  two  variables. 
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Figure  3.  Initial  variable  relationships  to  Computed  Tier  Score  for  the  FY2010  data 

Note:  Computed  Tier  Score  is  on  the  vertical  axis  of  each  plot,  and  each  independent 
variable  is  on  the  horizontal  axis. 


An  initial  observation  of  the  relationships  between  Computed  Tier  Score  and  the 
independent  variables  suggests  the  presence  of  possible  relationships  between  variables. 
For  example,  the  upward  trend  of  the  red  regression  line  in  the  IST  CRUNCHES  plot 
indicates  that  as  the  number  of  crunches  increases,  the  Computed  Tier  score  increases. 
Similarly,  as  the  IST_RUN  time  increases,  the  Computed  Tier  Score  decreases. 


26 


2,  Evaluation  of  the  Regression  Model 

We  explore  the  possibility  that  the  response  variable,  Computed  Tier  Score,  may 
need  to  be  transformed  in  order  to  better  satisfy  the  assumptions  of  a  linear  regression 
model.  To  do  this  we  use  the  Box-Cox  transformation  method  described  in  Chapter  III  in 
order  to  determine  if  a  transformation  of  the  dependent  variable  would  be  appropriate. 

For  this  model,  the  Box-Cox  method  produces  an  estimated  exponent  of  A  -  5.6  which 
is  extreme  given  that  the  numerical  scale  of  Computed  Tier  Score  is  in  the  low  thousands. 
This  result  suggests  that  the  Box-Cox  family  of  transformations  cannot  provide  a  useful 
resolution  of  the  dependent  varaible  as  discussed  in  Faraway  (2005).  We  decide  not  to 
transform  the  dependent  variable  in  this  case,  accepting  that  by  not  doing  so  the  error 
terms  may  not  be  approximately  normally  distributed,  which  requires  a  greater  exercise 
of  care  to  guard  against  the  effects  of  outliers  and  other  influential  observations. 

We  begin  with  all  18  possible  predictor  variables  listed  in  Table  5,  excluding  the 
ASVAB  subscores.  Variable  selection  using  Best  Subsets  Regression  with  cross- 
validation  is  performed  in  order  to  find  a  near-optimal  model  based  on  the  original  set  of 
independent  variables  as  discussed  in  Chapter  III.  When  conducting  cross-validation,  we 
find  the  optimal  model  to  contain  five  predictor  variables,  including  WEIGHT, 
IST  CRUNCHES,  IST  RUN,  RIFEE  SCORE,  and  WAIV  WEIGHT.  We  then  find  the 
best  subset  that  maximizes  adjusted  in  order  to  include  variables  that  are  highly 
regarded  as  entrance  criterion  into  an  MOS  in  the  USMC.  The  best  subset  size  contains 
eight  predictor  variables.  The  resulting  model  is  summarized  in  Figure  4. 


27 


ImCformula  -  Tier  ~  WEIGHT  +  1ST_CRUNCHES  +  1ST_RUN  +  RIFLE_SC0RE  + 
WAIV_WEIGHT  +  GT_SCORE  +  MM_SCORE  +  CL_SCORE ,  data  -  MasterlO) 

Residuals: 

Min  IQ  Median  3Q  Max 

-455.  34  -33.64  7.  38  47.42  245.74 

coefficients: 


Estimate 

Std.  Error  t 

value 

Pr (> 1 t 1 ) 

(intercept) 

1769.18820 

96.98545 

18.242 

<  2e-16 

*  *  * 

WEIGHT 

-0. 34523 

0.17437 

-1.980 

0.04839 

* 

IST_CRUNCHES 

0.62392 

0.23601 

2.644 

0.00852 

** 

IST_RUN 

-0.20447 

0.05594 

-3.655 

0. 00029 

*  *  * 

RIFLE_SCORE 

0.46530 

0.23745 

1.960 

0.05072 

, 

WAIV_WEIGHTTRUE 

-65.00997 

22.68478 

-2.866 

0.00437 

** 

GT_SCORE 

2.13337 

1.00373 

2.125 

0.03414 

* 

MM_SCORE 

-1. 38329 

0.70565 

-1.960 

0.05063 

CL.SCORE 

-1.14032 

0.74  311 

-1. 535 

0.12567 

signif.  codes: 

0  •***’  0.001  *•*'  0.01 

’  0. 

05  ’  0. 

1  ‘ 

Residual  standard  error:  87.18  on  412  degrees  of  freedom 
Multiple  R-squared:  0.1273,  Adjusted  R-squared:  0.1103 
F-statistic:  7.509  on  8  and  412  DF,  p-value:  2.273e-09 


Figure  4.  Computed  Tier  Score  model  output 


In  Figure  4,  the  “Estimate”  column  shows  the  regression  coefficients  for  each 
corresponding  predictor  variable,  while  the  “Pr(>|t|)”  column  gives  the  associated  p- 
values  for  each  estimate.  A  p-value  of  less  than  0.05  suggests  that  the  variable  is 
statistically  significant,  and  should  be  included  in  the  model. 

The  model  from  Figure  4  includes  421  records  and  eight  predictor  variables  from 
the  2010  data  set.  Table  6  shows  the  descriptive  statistics  for  the  seven  continuous 
variables  included  in  the  model.  The  descriptive  statistics  shown  are  mean,  median, 
standard  deviation,  minimum  value,  and  maximum  value.  The  only  binary  variable 
included  in  the  model  is  WAIV_WEIGHT,  with  405  Marines  (96.2  percent)  not  assigned 
a  weight  waiver  and  16  Marines  (3.8  percent)  receiving  a  weight  waiver  before  entering 
active  service. 
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Table  6.  Descriptive  statistics  for  the  quantitative  variables  used  in  Computed  Tier  Score  analysis 


Variable 

Mean 

Median 

Standard 

Deviation 

Minimum 

Maximum 

WEIGHT 

161.90 

161 

25.62 

96 

224 

1ST  CRUNCHES 

75.41 

73 

19.51 

44 

155 

1ST  RUN 

690.90 

690 

83.66 

460 

892 

RIEEE  SCORE 

287.50 

290 

19.42 

250 

329 

GT  SCORE 

101.80 

99 

10.40 

80 

136 

MM  SCORE 

100.20 

98 

11.89 

69 

140 

CE  SCORE 

103.40 

101 

8.87 

87 

137 
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We  explore  the  neeessity  for  non-linear  transformations  of  the  independent 
variables  using  partial  residual  plots  (Faraway,  2005).  We  use  cubie  basis  splines  with 
four  interior  knots  in  order  to  determine  if  a  non-linear  transformation  of  the  predietor 
variables  would  improve  the  model.  A  convenient  class  of  transformations  to  consider  for 
this  purpose  is  cubic  splines  with  interior  knots  placed  at  the  10th,  30th,  50th,  and  70th 
percentiles  of  a  variable.  When  used  with  variable  transformations,  these  plots  along  with 
95  percent  confidence  bands  suggest  the  types  of  transformations  that  are  plausible  for 
the  predictor  variables.  For  example,  if  a  straight  line  fits  within  the  confidence  bands,  it 
is  unlikely  that  a  nonlinear  transformation  is  needed  to  bring  out  the  explanatory  power  of 
the  variable  in  question.  The  resulting  partial  residual  plots  are  shown  in  Figure  5.  It  is 
clear  that  straight  lines  can  be  fit  within  the  confidence  bands  of  each  of  these  plots, 
which  suggests  that  a  simple  linear  model  formulation  should  be  adequate.  We  confirm 
this  by  conducting  an  F-test,  with  (42,370)  degrees  of  freedom  in  order  to  compare  the 
results  from  a  model  with  variable  transformation  versus  a  model  without  transformation. 
The  resulting  F-statistic  is  0.9534  with  a  p-value  of  0.5574.  This  comparison  indicates 
that  the  model  with  transformations  is  not  significantly  different  than  the  model  without 
transformation  at  the  a  -  0.05  test  level.  Therefore,  we  do  not  reject  the  null  hypothesis 
and  conclude  that  non-linear  transformation  of  the  predictor  variables  is  not  necessary. 
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Figure  5.  Partial  residual  plots  of  the  predictor  variables  used  in  the 

analysis  of  Computed  Tier  Score 


Note:  The  red  line  is  the  cubic  regression  spline,  and  the  blue  lines  are  95  percent 
confidence  bands.  If  a  straight  line  fits  between  the  blue  confidence  bands,  a  good 
indication  of  a  linear  relationship  exists. 
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The  regression  diagnostics  are  displayed  in  Figure  6  using  a  set  of  diagnostic 
plots  that  allow  for  examination  of  the  model  assumptions. 
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Figure  6.  Computed  Tier  Score  model  diagnostics 


As  shown  in  Figure  6,  the  Residuals  vs.  Fitted  plot  shows  no  obvious  patterns  of 
unequal  spread  about  the  x-axis,  thus  indicating  that  the  residuals  exhibit  constant 
variance.  The  Normal  Q-Q  plot  indicates  a  presence  of  heavier  than  normal  tails,  and 
exhibits  possible  signs  of  non-normality.  The  Residuals  vs.  Leverage  plot  shows  no 


32 


indication  of  overly  influential  data  points  in  the  model.  In  other  respects,  the  model 
diagnostics  indicate  that  the  model  assumptions  are  not  violated. 

3.  Explanation  of  the  Model  Results 

From  the  model  fit  in  Figure  4,  we  determine  that  the  most  significant  predictor 
variables  are  IST  RUN,  WAIV  WEIGHT,  and  IST  CRUNCHES.  Additionally, 
WEIGHT  and  GT  SCORE  narrowly  meet  the  0.05  p-value  threshold  for  inelusion  in  the 
model.  MM  SCORE  and  CE  SCORE  exhibit  interesting  relationships  to  Computed  Tier 
Score,  indieating  that  with  a  higher  score  in  either  test,  the  predicted  Computed  Tier 
Score  aetually  deereases.  This  model  provides  statistieally  significant  predictability  for 
measuring  success  in  terms  of  Computed  Tier  Score. 

B,  ANALYSIS  OF  THE  TIME  TO  ACHIEVE  E-4  USING  ALL  POSSIBLE 

PREDICTOR  VARIABLES  INCLUDING  ASVAB  SUBSCORES 

The  seeond  dependent  variable  we  eonsider  in  this  analysis  is  the  time  it  takes  in 
days  for  a  Marine  to  aehieve  the  pay  grade  of  E-4,  and  is  referred  to  as  time2E4  in  this 
study.  The  remaining  models  in  this  study  foeus  exelusively  on  analyzing  the  entry-level 
attributes  of  a  Marine  recruit  against  this  dependent  variable. 

Upon  initial  observation  and  variable  eorrelation  exploration,  we  determine  that 
the  ASVAB  subscores  are  highly  correlated  with  the  ASVAB  eomposite  scores.  This 
observation  makes  sense,  given  that  the  composite  seores  are  derived  from  the  subseores. 
Therefore,  we  perform  two  separate  linear  regressions  in  order  to  accurately  consider  all 
of  the  possible  predietors.  The  first  model  considers  all  possible  predictor  variables 
excluding  the  ASVAB  eomposite  scores.  The  seeond  model,  whieh  we  discuss  in  section 

C,  considers  all  possible  predictor  variables  while  excluding  the  ASVAB  subscores. 

1,  Initial  Variable  Relationship  Exploration 

Figure  7  shows  a  series  of  plots  that  provide  an  initial  look  at  the  nature  of  the 
relationships  between  each  independent  variable  and  the  time2E4.  The  red  line  in  each 
plot  is  a  regression  trend  line  that  describes  the  mean-relationship  between  the  two 
variables. 
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Figure  7.  Initial  variable  relationships  to  time  to  promote  to  E-4  for  the  2010  data  set 


Note:  Time2E4  is  on  the  vertical  axis  of  each  plot,  and  each  independent  variable  is  on 
the  horizontal  axis. 


An  initial  observation  of  the  relationships  between  time2E4  and  the  independent 
variables  suggests  a  presence  of  possible  relationships  between  variables.  For  example, 
the  downward  trend  of  the  red  regression  line  in  the  IST  CRUNCHES  plot  indicates  that 
as  the  number  of  crunches  increases,  the  time  to  achieve  the  pay  grade  of  E-4  decreases. 
Similarly,  as  the  IST  RUN  time  increases,  the  time  to  achieve  E-4  increases. 
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2,  Evaluation  of  the  Regression  Model 

We  explore  the  possibility  that  the  response  variable,  time2E4,  may  need  to  be 
transformed  in  order  to  better  satisfy  the  assumptions  of  a  linear  regression  model.  Based 
on  the  applieation  of  the  Box-Cox  proeedure  as  deseribed  in  Chapter  III,  the  dependent 
variable,  time2E4,  is  transformed  by  being  raised  to  the  power  -0.7. 

Prior  to  estimating  the  linear  regression  model,  we  conduet  variable  seleetion  in 
order  to  find  a  near-optimal  model  based  on  the  original  set  of  independent  variables.  We 
begin  with  22  possible  predictor  variables,  as  listed  in  Table  5,  exluding  the  ASVAB 
composite  scores.  Variable  selection  using  Best  Subsets  Regression  with  cross-validation 
is  performed  to  find  the  best  subset  of  the  original  independent  variables  as  discussed  in 
Chapter  III.  Figure  8  shows  the  results  of  fitting  the  linear  regression  model  for  an 
individual  Marine’s  predicted  time2E4. 


lai(forinula  -  Ytime  -  IST.CRUNCHES  +  IST.RUN  ♦  RIFLE_SC0RE  +  WAIV_WEIGHT  + 
•fGS  +  MK  +  PC,  data  »  MasterlO) 

Residuals: 

Min  IQ  Median  3Q  Max 

-0.0044329  -0.0008939  0.0000736  0.0010470  0.0062517 

coefficients : 


Estimate 

Std.  Error  t 

value 

Pr(>|t|) 

(intercept) 

4. 348e-03 

1.734e-03 

2.  507 

0.01254 

• 

IST_CRUNCMES 

1.177e-05 

4.180e-06 

2.816 

0.00509 

•  • 

I5T_RUN 

-2. 530e-06 

9.829e-07 

-2. 574 

0.01040 

* 

RIFLE_SCORE 

1.260e-05 

4.061e-06 

3.103 

0.00205 

*  • 

WAIV_WEIGHTTRUE 

-8. 545e-04 

4.021e-04 

-2.125 

0.03414 

* 

GS 

-4.092e-05 

1. 383e-05 

-2.958 

0.00327 

*  * 

MK 

4. 516e-05 

1.491e-05 

3.030 

0.00260 

PC 

3. 780e-05 

1. 505e-05 

2.513 

0.01236 

* 

Signif.  codes: 

0  •*••*  0.001  0.01 

•*’  0. 

05  ’  0. 

1 

Residual  standard  error:  0.001551  on  413  degrees  of  freedom 
Multiple  R-squared:  0.1432,  Adjusted  R-squared:  0.1286 
F-statistic:  9.859  on  7  and  413  DF,  p-value:  2.181e-ll 


Figure  8.  All  variables  with  ASVAB  subscore  model  output 
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Based  on  the  results  from  Figure  8,  the  variables  ineluded  in  the  model  are 
statistieally  signifieant  with  p-values  of  less  than  0.05,  as  seen  in  the  “Pr(>|t|)”  eolumn. 

Table  7  shows  the  deseriptive  statisties  for  the  six  quantitative  variables  ineluded 
in  the  model.  The  deseriptive  statisties  shown  are  mean,  median,  standard  deviation, 
minimum  value,  and  maximum  value. 


Table  7.  Deseriptive  statisties  for  the  quantitative  variables  used  in  the 
analysis  of  time  to  aehieve  E-4  using  ASVAB  subseore  and  all 

predietors 


Variable 

Mean 

Median 

Standard 

Deviation 

Minimum 

Maximum 

ISTCRUNCHES 

75.41 

73 

19.51 

44 

155 

ISTRUN 

690.90 

690 

83.66 

460 

892 

RIELESCORE 

287.50 

290 

19.42 

250 

329 

GS 

50.58 

50 

6.35 

35 

73 

MK 

53.00 

53 

5.14 

38 

72 

PC 

51.47 

51 

5.85 

37 

69 

We  explore  the  neeessity  for  non-linear  transformations  of  the  independent 
variables  using  partial  residual  plots  (Faraway,  2005).  Shown  in  Figure  9,  we  use  eubie 
basis  splines  with  four  interior  knots  in  order  to  determine  if  a  non-linear  transformation 
of  the  predietor  variables  would  improve  the  model.  It  is  elear  that  straight  lines  ean  be 
fit  within  the  eonfidenee  bands  of  eaeh  of  these  plots,  whieh  suggests  that  a  simple  linear 
model  formulation  should  be  adequate.  We  eonfirm  this  by  eondueting  an  F-test,  with 
(42,370)  degrees  of  freedom  in  order  to  eompare  the  results  from  a  model  with  variable 
transformation  versus  a  model  without  transformation.  The  resulting  F-statistie  is  1.5163 
with  a  p-value  of  0.1715.  This  eomparison  indieates  that  the  model  with  transformations 
is  not  signifieantly  different  than  the  model  without  transformation  at  the  a  -  0.05  test 
level.  Therefore,  we  do  not  reject  the  null  hypothesis  and  conclude  that  non-linear 
transformation  of  the  predictor  variables  is  not  necessary. 
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Figure  9.  Partial  residual  plots  of  the  predietor  variables  used  in  the  analysis  of  time  to 

aehieve  E-4  using  ASVAB  subscore  and  all  predictors 

Note:  The  red  line  is  the  cubic  regression  spline,  and  the  blue  lines  are  95  percent 
confidence  bands.  If  a  straight  line  fits  between  the  blue  confidence  bands,  a  good 
indication  of  a  linear  relationship  exists. 
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The  model  diagnostic  plots  shown  in  Figure  10  indicate  that  the  model 
assumptions  are  met  and  support  the  findings  of  the  model. 
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Figure  10.  All  variables  with  ASVAB  subscores  model  diagnostics, 

after  Box-Cox  transformation 


The  Residuals  vs.  Fitted  plot  shows  no  signs  of  heteroscedasticity  as  there  are  no 
obvious  patterns  of  unequal  spread  about  the  horizontal  axis,  thus  indicating  that  the 
residuals  exhibit  constant  variance.  The  Normal  Q-Q  plot  indicates  that  the  distribution  of 
our  data  supports  normality  as  the  points  trend  nearly  to  a  straight  line.  Finally,  the 
Residuals  vs.  Leverage  plot  indicates  that  there  are  no  overly  influential  data  points  in  the 
model.  The  largest  value  for  Cook’s  distance  is  0.044,  which  is  well  below  the  commonly 
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used  warning  value  of  0.5.  The  plots  in  Figure  10  indieate  that  the  model  assumptions 
have  been  met  and  provide  a  valid  model. 

3.  Explanation  of  the  Model  Results 

From  the  model  fit  in  Figure  8,  we  find  that  the  most  signifieant  predietor 
variables  for  time2E4  are  RIFLE_SCORE,  MK,  GS,  IST  CRUNCHES,  IST  RUN,  PC, 
and  WAIV  WEIGHT.  It  is  important  to  note  that  while  these  variables  are  statistieally 
signifieant  in  this  model,  they  would  not  neeessarily  be  statistieally  signifieant  or  have 
the  same  level  of  signifioanee  when  modelled  with  a  different  year  of  reeords.  The  model 
results  differ  from  those  of  the  Computed  Tier  Seore  model  and  show  different 
relationships  between  the  entry-level  attributes  and  eaeh  dependent  variable.  This 
provides  evidenee  that  the  two  metries,  or  dependent  variables,  used  in  our  analysis  are 
substantially  different. 

To  evaluate  the  effeet  eaeh  predietor  variable  has  on  the  estimated  time  to  aehieve 
the  pay  grade  of  E-4,  we  use  the  median  values  shown  in  Table  7  to  ereate  a  notional 
Marine  for  eomparison.  This  notional  Marine  not  reeeiving  a  weight  waiver  has  an 
estimated  time2E4  of  approximation  787.2  days,  with  a  95  pereent  eonfidenee  interval  of 
[526.6,  1380.3].  If  the  notional  Marine  reeeived  a  weight  waiver  before  entering  serviee, 
then  the  estimated  time2E4  is  approximately  902.2  days,  with  a  95  pereent  eonfidenee 
interval  of  [576.4 , 1739.3]. 

Tables  8  and  9  show  the  individual  effeet  on  the  estimated  time2E4  when 
inereasing  or  deereasing  the  six  numerieal  predietor  variables  individually  by  10  pereent; 
as  well  as  varying  WAIV  WEIGHT  from  false  to  true.  Beginning  in  the  seeond  eolumn, 
eaeh  eolumn  shows  the  effeet  on  the  predieted  time2E4  by  ehanging  only  the  heading 
variable  while  holding  all  other  variables  eonstant.  The  “Differenee”  row  shows  the 
individual  impaet  that  eaeh  ehange  in  the  predietor  variable  has  on  time2E4.  The 
“Aeeounting  for  eensoring”  row  shows  the  predieted  time2E4  while  aeeounting  for 
eensoring  in  the  model,  using  the  method  deseribed  in  Chapter  IV  for  fitting  regressions 
to  eensored  data.  The  variable  names  have  been  shortened  for  presentation  of  the  data. 
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Table  8.  Effect  of  increasing  predictor  variable  values  on  predicted  time  to  achieve  the  pay  grade  of  E-4 


Variable 

Notional 

CRUNCHES 

RUN 

RIFLE 

GS 

MK 

PC 

WEIGHT 

CRUNCHES 

73 

80 

73 

73 

73 

73 

73 

73 

RUN 

690 

690 

621 

690 

690 

690 

690 

690 

RIFLE 

290 

290 

290 

319 

290 

290 

290 

290 

GS 

50 

50 

50 

50 

55 

50 

50 

50 

MK 

53 

53 

53 

53 

53 

58 

53 

53 

PC 

51 

51 

51 

51 

51 

51 

56 

51 

WEIGHT 

FALSE 

FALSE 

FALSE 

FALSE 

FALSE 

FALSE 

FALSE 

TRUE 

Time2E4 

787.2 

777.5 

766.8 

745.5 

812.4 

761.0 

765.1 

902.2 

Difference 

- 

-9.7 

-20.4 

-41.7 

25.2 

-26.2 

-22.1 

115.0 

Accounting  for  censoring 

791.2 

780.8 

769.8 

749.1 

817.1 

764.6 

767.9 

914.1 

Note;  The  changes  to  each  predictor  variable  are  indicated  by  the  red  numbers, 
while  holding  all  other  values  of  the  predictor  variables  constant. 
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Table  9.  Effect  of  decreasing  predictor  variable  values  on  predicted  time  to  achieve  the  pay  grade  of  E-4 


Variable 

Notional 

CRUNCHES 

RUN 

RIFLE 

GS 

MK 

PC 

WEIGHT 

CRUNCHES 

73 

66 

73 

73 

73 

73 

73 

73 

RUN 

690 

690 

759 

690 

690 

690 

690 

690 

RIFLE 

290 

290 

290 

261 

290 

290 

290 

290 

GS 

50 

50 

50 

50 

45 

50 

50 

50 

MK 

53 

53 

53 

53 

53 

48 

53 

53 

PC 

51 

51 

51 

51 

51 

51 

46 

51 

WEIGHT 

FALSE 

FALSE 

FALSE 

FALSE 

FALSE 

FALSE 

FALSE 

TRUE 

Time2E4 

782.2 

797.2 

808.6 

833.2 

762.4 

815.1 

810.5 

902.2 

Difference 

- 

15.0 

26.4 

51.0 

-19.8 

32.9 

28.3 

120.0 

Accounting  for  censoring 

791.2 

802.0 

813.7 

837.6 

766.7 

819.5 

815.8 

914.1 

Note;  The  changes  to  each  predictor  variable  are  indicated  by  the  red  numbers, 
while  holding  all  other  values  of  the  predictor  variables  constant. 
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From  Table  8,  the  largest  improvement  in  predieted  time2E4  results  from  an 
increase  in  Rifle  Score,  followed  by  MK,  PC,  Run  Time,  and  Crunches,  respectively. 
Receiving  a  weight  waiver  significantly  impacts  the  predicted  value  in  a  negative  way,  by 
increasing  the  predicted  time2E4  by  1 15  days.  Table  9  presents  the  effects  of  decreasing 
each  of  the  independent  variables  by  the  same  magnitudes  of  change  used  in  Table  8.  As 
shown  in  the  model  summary  presented  in  Eigure  8,  the  GS  ASVAB  subscore  indicates  a 
surprisingly  negative  relationship  with  achieving  the  pay  grade  of  E-4.  This  may  be  due 
to  the  correlation  of  the  GS  subscore  to  the  other  predictor  variables  present  in  the  model, 
and  warrants  further  investigation  as  additional  data  become  available.  The  last  row  of 
these  two  tables  gives  the  results  of  applying  the  survreg  function  in  R  to  account  for  the 
twenty  censored  values  of  time2E4.  Not  surprisingly,  the  predicted  times  to  promotion 
are  somewhat  larger  when  censoring  is  taken  into  account,  although  the  effect  is  minimal. 

C.  ANALYSIS  OF  THE  TIME  TO  ACHIEVE  E-4  USING  ALL  POSSIBLE 

PREDICTOR  VARIABLES  INCLUDING  ASVAB  COMPOSITE  SCORES 

1.  Evaluation  of  the  Regression  Model 

We  first  explore  the  possibility  of  transforming  the  response  variable,  time2E4, 
using  the  application  of  the  Box-Cox  procedure  outlined  in  Chapter  III.  Based  on  an 
application  of  this  procedure,  the  dependent  variable  was  transformed  by  being  raised  to 
the  power  -0.7, 

We  conduct  variable  selection  in  order  to  find  a  near-optimal  model  based  on  the 
original  set  of  20  independent  variables,  as  listed  in  Table  5,  exluding  the  ASVAB 
subscores  scores.  Best  Subsets  Regression  with  cross-validation  is  used  to  identify  a 
subset  of  predictor  variables  for  the  development  of  the  regression  model.  Eigure  11 
shows  the  results  of  fitting  the  linear  regression  model  using  the  optimal  set  of 
independent  variables  for  an  indivual  Marine’s  predicted  time  to  achieve  the  pay  grade  of 
E-4.  There  was  no  need  for  non-linear  transformation  of  variables,  as  the  predictive 
power  of  the  model  would  not  be  improved. 


42 


lm(formu1a  ■ 

YTime  ~  IST_CRUNCHES 

+  1ST 

_RUN  + 

RIFLE_SCORE  +  WAIV_WEIGHT  + 

CL_SCORE. 

data  -  MasterlO) 

Residuals : 

Min 

IQ  Median 

3Q 

Max 

-0.0045221  -0 

.0008783  0.0001143 

0.0010144  C 

.0062726 

coefficients: 

Estimate 

std.  Error  t 

value 

Pr(>|tl) 

(Intercept) 

5.198e-03 

1.696e 

-03 

3.065 

0.00232  •* 

IST_CRUNCHES 

1.262e-05 

4.229e 

-06 

2.983 

0.00302  •• 

IST.RUN 

-2.733e-06 

9.956e 

-07 

-2.745 

0.00632  *• 

RIFLE_SCORE 

1.050e-05 

4.058e 

-06 

2.588 

0.01000  ** 

WAIV_WEIGHTTRUE  -8. 106e-04 

4.071e 

-04 

-1.991 

0.04711  • 

CL_SCORE 

2.029e-05 

8. 727e 

-06 

2.  325 

0.02056  • 

Signif.  codes 

:  0  ••••’  0.001 

0.01 

•*’  0. 

05  ’  0.1  •  ’  1 

Residual  standard  error:  0. 

001575 

on  415  degrees  of  freedom 

Multiple  R-squared:  0.1123,  Adjusted  R-squared:  0.1016 

F-statistic: 

10.  5  on  5  and 

415  DF 

.  P- 

value: 

1.657e-09 

Figure  1 1 .  All  variables  with  ASVAB  eomposite  seores  model  output 


The  results  of  model  fitting  shown  in  Figure  11  indieate  that  the  variables 
included  in  the  model  are  statistically  significant  with  p-values  of  less  than  0.05. 

The  descriptive  statistics  for  the  four  quantitative  variables  included  in  the  model 
are  displayed  in  Table  10,  and  include  mean,  median,  standard  deviation,  minimum 
value,  and  maximum  value. 


Table  10.  Descriptive  statistics  for  the  quantitative  variables  used  in  the 

analysis  of  time  to  achieve  E-4  using  ASVAB  composite  score  and  all 

predictors 


Variable 

Mean 

Median 

Standard 

Deviation 

Minimum 

Maximum 

ISTCRUNCHES 

75.41 

73 

19.51 

44 

155 

ISTRUN 

690.90 

690 

83.66 

460 

892 

RIFLESCORE 

287.50 

290 

19.42 

250 

329 

CLSCORE 

103.40 

101 

8.87 

87 

137 

The  model  diagnostic  plots  given  in  Figure  12  indicate  that  the  model 
assumptions  are  met  and  support  the  findings  of  the  model. 
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Figure  12.  All  variables  with  ASVAB  composite  scores  model  diagnostics, 

after  Box-Cox  transformation 


The  diagnostic  plots  provide  evidence  that  the  errors  are  independent,  have 
constant  variance,  are  normally  distributed,  and  contain  no  overly  influential  observations 
that  could  effect  the  model. 

2,  Explanation  of  the  Model  Results 

From  the  fitted  model  in  Figure  11,  we  have  determined  that  the  most  significant 
predictor  variables  are  IST  RUN,  IST  CRUNCHES,  RIFLE  SCORE,  CL  SCORE,  and 
WAIV  WEIGHT.  We  evaluate  the  effect  that  each  predictor  variable  has  on  the 
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estimated  time  to  achieve  the  pay  grade  of  E-4  by  using  the  median  values  shown  in 
Table  10  to  create  a  notional  Marine  for  comparison.  This  notional  Marine  without  a 
weight  waiver  has  an  estimated  time2E4  of  approximation  794.9  days,  with  a  95  percent 
confidence  interval  of  [527.6,  1415.3].  If  the  notional  Marine  received  a  weight  waiver 
before  entering  service,  then  the  estimated  time2E4  is  approximately  905.1  days,  with  a 
95  percent  confidence  interval  of  [574.3  ,  1770.882]. 

Tables  11  and  12  show  the  individual  effect  on  the  estimated  time2E4  when 
increasing  or  decreasing  the  six  numerical  predictor  variables  individually  by  10  percent; 
as  well  as  varying  WAIV_WEIGHT  from  false  to  true.  The  changes  to  each  predictor 
variable  are  indicated  by  the  red  numbers.  Beginning  in  the  second  column,  each  column 
shows  the  effect  on  the  predicted  time2E4  (in  days)  by  changing  only  the  heading 
variable  while  holding  all  other  variables  constant.  The  “Difference”  row  shows  the 
individual  impact  that  each  change  in  the  predictor  variable  has  on  time2E4.  The 
“Accounting  for  censoring”  row  shows  the  predicted  time2E4  while  accounting  for 
censoring  in  the  model,  displaying  only  minimal  effect  from  censoring.  The  variable 
names  have  been  shortened  for  presentation  of  the  data. 


Table  1 1 .  Effect  of  increasing  predictor  variable  value  on  the  predicted  time 

to  achieve  the  pay  grade  of  E-4 


Variable 

Notional 

CRUNCHES 

RUN 

RIFLE 

CL  SCORE 

WEIGHT 

CRUNCHES 

73 

80 

73 

73 

73 

73 

RUN 

690 

690 

621 

690 

690 

690 

RIFLE 

290 

290 

290 

319 

290 

290 

CLSCORE 

101 

101 

101 

101 

111 

101 

WEIGHT 

FALSE 

FALSE 

FALSE 

FALSE 

FALSE 

TRUE 

Time2E4 

794.9 

784.2 

772.5 

759.2 

770.8 

905.1 

Difference 

- 

-10.7 

-22.4 

-35.7 

-24.1 

110.2 

Accounting 
for  censoring 

798.9 

787.5 

775.4 

763.0 

774.3 

917.0 

Note:  The  changes  to  each  predictor  variable  are  indicated  by  the  red  numbers,  while 
holding  all  other  values  of  the  predictor  variables  constant. 
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Table  12.  Effect  of  decreasing  predictor  variable  values  on  the  predicted  time 

to  achieve  the  pay  grade  of  E-4 


Variable 

Notional 

CRUNCHES 

RUN 

RIFLE 

CL  SCORE 

WEIGHT 

CRUNCHES 

73 

66 

73 

73 

73 

73 

RUN 

690 

690 

759 

690 

690 

690 

RIFLE 

290 

290 

290 

261 

290 

290 

CLSCORE 

101 

101 

101 

101 

91 

101 

WEIGHT 

FALSE 

FALSE 

FALSE 

FALSE 

FALSE 

TRUE 

Time2E4 

794.9 

805.7 

818.4 

833.4 

820.2 

905.1 

Difference 

- 

10.8 

23.5 

38.5 

25.3 

110.2 

Accounting 
for  censoring 

798.9 

810.6 

823.6 

837.9 

824.9 

917.0 

Note:  The  changes  to  each  predictor  variable  are  indicated  by  the  red  numbers,  while 
holding  all  other  values  of  the  predictor  variables  constant. 


From  Table  11,  the  largest  improvement  in  predicted  time2E4  results  from  an 
increase  in  Rifle  Score,  followed  by  CL  SCORE,  Run  Time,  and  Crunches,  respectively. 
Receiving  a  weight  waiver  significantly  impacts  the  predicted  value  in  a  negative  way,  by 
increasing  the  predicted  time2E4  by  117.9  days.  Table  12  presents  the  effect  of 
degrading  each  of  the  dependent  variable  and  provides  the  same  ranking  relationship  of 
the  independent  variables. 

3,  Evaluation  and  Comparison  of  the  Regression  Model  Results  for  the 
ASVAB  Suhscore  Model  and  the  ASVAB  Composite  Score  Model 

This  section  provides  a  summary  and  comparison  of  the  model  outputs  from  the 
two  models  considered  in  predicting  time2E4;  the  regression  model  that  uses  all  possible 
predictor  variables  including  the  ASVAB  subscores  and  the  regression  model  that  uses  all 
possible  predictor  variables  including  the  ASVAB  composite  scores. 

Table  13  displays  a  summary  of  the  model  predictions  for  a  notional  Marine  that 
did  not  receive  a  weight  waiver. 
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Table  13.  Comparison  of  model  results  for  a  notional  Marine  that  did  not 

reeeive  a  weight  waiver 


Model 

Variables  included  in 

Model 

Predicted  time2E4 
w/out  Weight 
Waiver 

95%  Cl 

ASVAB  Subscore 

1ST  CRUNCHES, 

1ST  RUN,  RIFEE  SCORE, 
GS,  MK,  PC, 
WAIVWEIGHT 

787.2 

[526.6,  1380.3] 

ASVAB  Composite 

Score 

1ST  CRUNCHES, 

1ST  RUN,  RIFEE  SCORE, 
CE  SCORE, 

WAIV  WEIGHT 

794.9 

[527.6,  1415.3] 

Table  13  displays  similar  model  outputs.  Both  models  find  IST  CRUNCHES, 
IST  RUN,  RIFLE  SCORE,  and  WEIGHT  WAIV  to  be  statistically  significant  for  inclusion. 
The  relationship  of  each  of  these  variables  is  the  same  in  both  models  in  terms  of  increasing  or 
decreasing  the  predicted  value  of  the  dependent  variable.  Each  model  provides  similar 
predictions  and  95  percent  confidence  intervals  for  the  predicted  time2E4. 

D,  EXPLORATION  OF  COMBINING  THE  DATA  INTO  A  MULTI-YEAR 
MODEL  (FY2008-FY2010) 

This  section  of  the  analysis  explores  the  possibility  of  pooling  the  data  from  each 
year  into  one  complete  data  set  of  Marines  with  the  0621  MOS  from  FY2008  through 
FY2010.  Pooling  the  data  into  a  multi-year  study  allows  us  to  determine  if  the  entry-level 
attributes  are  consistently  predictive  over  time.  The  breakdown  of  the  number  of 
observations  used  by  year  is  shown  in  Table  14.  The  total  number  of  observations 
included  in  the  model  is  1,126. 


Table  14.  Summary  of  the  number  of  observations  used  by  year 


Fiscal  Year 

Number  of  Observations 

2008 

351 

2009 

354 

2010 

421 

Total 

1,126 
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1,  Evaluation  of  the  Regression  Model 

We  begin  with  18  possible  predietor  variables,  as  listed  in  Table  5,  exluding  the 
ASVAB  subseores.  Based  on  an  applieation  of  the  Box-Cox  proeedure,  the  dependent 
variable,  time2E4,  was  transformed  by  being  raised  to  the  power  -0.3.  In  order  to 
determine  if  the  data  from  individual  fiseal  years  ean  be  pooled  to  fit  a  eommon  model, 
we  add  the  fiseal  year  as  a  eategorieal  variable  and  run  the  regression  model.  The  results 
of  fitting  the  linear  regression  model  are  displayed  in  Figure  13. 


l«(forniula  «  (time2E4)A(-0.  3)  -  year  +  Age  +  GCnocr  +  weight  + 

IST_CRUNCMES  ♦  I5T_RUN  +  RIFLE_SCORE  WAIV_WEIGHT  +  CL_SCORE . 
data  •  Master,  subset  -  tt.noseps) 

Residuals ; 

Min  IQ  Median  3Q  Max 

-0.030314  -0.007128  0.001173  0.007072  0.042389 

coefficients : 


Estimate 

std.  Error 

t  value 

Pr(>|t|) 

(intercept) 

1.294e-01 

7.020e-03 

18.426 

<  2e-16 

)»*• 

year09 

-5.718e-03 

1.108e-03 

-5.159 

2.946-07 

*  •  * 

yearlO 

-7.218e-03 

1.107e-03 

-6.  518 

1.08e-10 

•  •• 

Age 

6.490e-04 

1.870e-04 

3.470 

0.000540 

*  *  • 

GENOERM 

-2.7136-03 

1.374e-03 

-1.975 

0.048554 

• 

WEIGHT 

•3.628e-05 

1.457e-05 

-2.490 

0.012910 

* 

IST.CRUMCHES 

4.975e-05 

2.1176-05 

2.  350 

0.018953 

* 

IST_RUN 

-2.462e-05 

5.229e-06 

-4.708 

2.82e-06 

«*• 

RIFLE_SCORE 

5.167e-05 

1.2866-05 

4.017 

6. 306-05 

**• 

WAIV_WE IGHTTRUE 

-3.741e-03 

1.453e-03 

-2. 575 

0.010155 

• 

CL_SCORE 

5.879e-05 

1.7176-05 

3.424 

0.000639 

•  •• 

Signif.  codes:  0  0.001  0.01  0.05  *.*  0.1  ‘  '  1 

Residual  standard  error;  0.01165  on  1115  degrees  of  freedom 
Multiple  R-squared:  0.09637,  Adjusted  R-squared:  0.08826 
F-statistic:  11.89  on  10  and  1115  DF,  p-value:  <  2.2e-16 


Figure  13.  Multi-year  model  ineluding  year  variable 


The  model  shown  in  Figure  13  ineluded  the  individual  fiseal  years  as  being 
signifieant  predictors  in  the  regression.  This  reveals  that  the  year  variable  provides 
statistically  significant  information  in  predicting  a  Marine’s  time  to  achieve  the  pay  grade 
of  E-4.  The  regression  coefficients  for  year09  and  yearlO  have  significant  effects  on  the 
dependent  variable.  This  result  argues  against  pooling  the  data  from  different  years  to  fit 
a  common  model. 
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The  descriptive  statistics  for  the  six  quantitative  variables  included  in  the  model 
are  displayed  in  Table  15. 


Table  15.  Descriptive  statistics  for  the  quantitative  variables 

used  in  multi-year  model 


Variable 

Mean 

Median 

Standard 

Deviation 

Minimum 

Maximum 

Age 

20.13 

19.62 

1.87 

17.28 

30.03 

WEIGHT 

163.10 

161.00 

27.64 

96 

259 

ISTCRUNCHES 

70.00 

67.00 

18.32 

39 

155 

ISTRUN 

706.60 

717.00 

79.62 

450 

918 

RIFLESCORE 

270.3 

282.0 

20.28 

248 

332 

CLSCORE 

99.07 

101.00 

21.52 

85 

140 

Further  support  for  not  pooling  the  data  from  different  years  to  fit  a  common 
model  can  be  found  by  inspecting  side-by-side  boxplots  of  the  residuals  from  the 
regression  broken  down  by  year,  as  shown  in  Figure  14. 
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Figure  14.  Comparison  of  regression  errors  across  three  years  of  data 
using  boxplots  with  time2E4  as  the  outcome  variable 
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As  seen  in  the  eomparison  of  the  boxplots,  the  varianee  of  the  regression  errors 
deereases  from  2008  to  2009,  and  then  again  from  2009  to  2010.  The  regression  errors  do 
not  exhibit  eonstant  varianee  and  violate  the  basie  model  assumptions.  This  deerease  in 
the  varianee  of  the  regression  errors  possibly  indieates  that  the  aeeuraey  of  the  data 
improves  aeross  the  years,  and  eould  be  explained  simply  by  the  ehanging  Marine  Corps 
polieies  from  year  to  year  for  Marine  reeruitment  or  ehanging  promotion  requirements. 
We  eonelude  that  the  individual  data  sets  or  possibly  the  relationships  are  not 
homogeneous  aeross  years.  Most  importantly,  this  exereise  suggests  that  this  analysis 
should  be  repeated  on  an  annual  basis,  and  not  pooled  into  a  multi-year  study,  at  least  into 
the  near  future. 

E.  CHAPTER  SUMMARY 

This  ehapter  provides  a  detailed  explanation  of  the  four  models  ereated  in  order  to 
study  the  relationships  between  entry-level  attributes  of  Marine  reeruits  with  the  0621 
MOS  and  two  dependent  variables;  the  Computed  Tier  Seore  and  time2E4.  Statistieally 
signifieant  relationships  between  both  dependent  variables  and  the  entry-level  attributes 
are  found  to  exist. 
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V.  CONCLUSIONS  AND  RECOMMENDATIONS 


A,  CONCLUSIONS 

This  thesis  develops  multivariate  linear  regression  models  to  identify  the  most 
important  determinants  of  a  Marine’s  advancement  to  the  pay  grade  of  E-4  within  the 
0621  Field  Radio  Operator  MOS.  Further,  we  determine  that  these  models  have 
statistically  significant  predictive  power  for  a  Marine’s  Computed  Tier  Score  at  the  time 
of  eligibility  for  re-enlistment.  We  present  evidence  that  these  studies  should  be  repeated 
on  an  annual  basis  vice  pooling  the  data  into  multi-year  studies.  Specifically,  four 
questions  are  considered  in  our  analysis,  which  are  presented  in  this  section  with  our 
findings. 

1.  Do  significant  relationships  exist  between  entry-level  attributes  of  a  USMC 
recruit  and  the  USMC  Computed  Tier  Score  or  the  time  for  a  Marine  to 
achieve  the  pay  grade  of  E-4? 

This  study  has  determined  that  there  are  statistically  significant  relationships 
between  the  entry-level  attributes  of  a  Marine  recruit  and  the  USMC  Computed  Tier 
Score,  as  well  as  the  time  to  achieve  the  pay  grade  of  F-4  within  the  0621  MOS  in  the 
USMC.  Fntry-level  attributes  of  Marine  recruits  can  be  utilized  to  predict  these 
dependent  variables. 

2,  What  are  the  most  influential  independent  variables  that  predict  the 
Computed  Tier  Score  and  the  rate  of  promotion  to  E-4  in  the  0621  MOS? 

The  most  influential  independent  predictor  variables  that  allow  prediction  of  the 

Computed  Tier  Score  are  found  to  be  IST  RUN,  WAIV  WFIGHT,  IST  CRUNCHES, 

GT  SCORE,  and  WEIGHT.  The  predicted  value  of  Computed  Tier  Score  increases  as 

IST  RUN  and  WEIGHT  decrease,  or  as  IST  CRUNCHES  and  GT  SCORE  increase. 

Of  particular  interest,  CE  SCORE  and  MM  SCORE  exhibit  a  decreasing  relationship 

with  the  Computed  Tier  Score.  The  latter  does  not  imply  that  doing  well  on  these  scores 

should  be  a  negative  factor  in  evaluating  a  Marine,  but  it  does  suggest  that  relationships 

between  the  predictor  variables  may  lead  to  a  statistical  result  of  this  kind. 
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As  shown  in  Table  13  (Chapter  IV),  ISTCRUNCHES,  ISTRUN, 
RIFLE  SCORE,  GS,  MK,  PC,  WAIV  WEIGHT,  and  CE  SCORE  are  the  most 
influential  predictor  variables  used  to  determine  success  as  defined  in  terms  of  the  time  to 
achieve  the  pay  grade  of  E-4.  RIFEE  SCORE  is  the  most  influential  predictor  variable 
that  has  a  beneficial  relationship  to  time2E4,  while  receiving  a  weight  waiver  prior  to 
entering  service  has  the  largest  negative  effect.  RErN_TIME,  MK,  PC,  and  CE  SCORE 
follow  RIFEE  SCORE  as  providing  positive  impact  on  the  predicted  time2E4,  all  having 
a  similarly  influential  effect. 

3.  What  insight  does  this  analysis  provide  in  terms  of  recommending  changes  to 
the  current  entrance  criteria  for  the  0621  Field  Radio  Operator  MOS? 

While  IST  CRUNCHES,  IST  RUN,  RIFEE  SCORE,  and  WEIGHT  provide 

insight  into  the  predicted  time2E4,  the  relationships  of  time2E4  with  GS,  MK,  PC,  and 

CE  SCORE  merit  further  exploration  for  inclusion  in  the  entrance  criteria  of  a  Field 

Radio  Operator.  Interestingly,  EE  SCORE,  which  is  currently  used  as  one  of  the  criteria 

for  entry  into  the  0621  MOS,  was  not  found  to  have  a  statistically  significant  relationship 

to  time2E4.  This  does  not  indicate  that  EE  SCORE  is  not  a  significant  measure  of 

suitability  to  the  0621  MOS,  but  rather  that  other  ASVAB  scores  may  provide  similar 

information  in  predicting  time2E4. 

4,  What  direction  should  a  future  study  take  to  examine  ways  in  which  the 
matching  of  USMC  recruits  to  MOS  fields  can  he  improved? 

In  order  to  explore  other  suitability  to  MOS  measures  that  could  lend  to  predicting 

a  successful  match,  there  is  a  need  for  the  development  of  new  suitability  measures.  As 

explained  in  Chapter  II,  the  Center  for  Naval  Analyses  (CNA)  developed  job 

performance  measures  for  a  limited  number  of  MOSs  in  order  to  test  proficiency  in 

performing  duties  as  outlined  by  the  USMC.  We  recommend  that  similar  job 

performance  measures  be  created  across  all  high-density  MOSs  in  order  to  support 

studies  focused  on  matching  a  USMC  recruit  to  his  or  her  MOS.  This  study  can  then  be 

replicated  using  a  metric  that  is  focused  on  the  quality  of  matching  as  the  dependent 

variable  for  analysis. 
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B,  RECOMMENDATIONS  FOR  FUTURE  WORK 

Based  on  the  findings  in  this  study,  the  following  future  work  is  suggested  to 
expand  this  field  of  researeh  and  the  seope  of  our  findings. 

The  models  and  methodologies  utilized  in  this  study  should  be  expanded  to  other 
high-density  MOSs  within  the  USMC.  With  a  better  understanding  of  the  influential 
predictors  within  each  MOS,  further  recommendations  can  be  made  to  other  MOSs 
considered.  Further,  an  optimization  of  the  placement  of  a  selected  pool  of  Marines  into 
the  MOSs  that  need  to  be  filled  would  provide  the  USMC  with  a  tool  to  improve  the 
quality  of  matching  available  Marines  to  the  MOSs. 

The  USMC  Manpower  Database,  TFDW,  is  a  vast  resource  of  data  that  can  be 
used  to  support  future  studies.  Data  collection  through  the  USMC  database  requires  an 
extensive  level  of  knowledge  of  the  system  and  is  not  user-friendly.  The  improvement 
and  development  of  a  user-friendly  and  readily  accessible  database  would  be  a  significant 
advantage  to  those  using  TFDW  for  data  analysis  purposes.  More  specifically,  the 
development  of  a  complete  and  more  detailed  data  dictionary  and  user  interface  would 
improve  the  availability  of  data. 

Further  exploration  and  development  of  new  performance  and  suitability 
measures  could  provide  useful  results  when  analyzing  the  influence  of  various  predictor 
variables.  With  the  development  of  standardized  performance  metrics,  this  study  could 
then  be  expanded  and  provide  further  insight  into  the  job  matching  problem. 
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